Real-Time Object Detection Using YOLOv3

[ machine-learning | academic ]

Summary

In this project, I conducted an experimental analysis of YOLOv3’s capability for real-time object detection, a critical component in autonomous driving systems. After an extensive review and comparison of leading real-time object detection algorithms, including SSD and previous YOLO versions, YOLOv3 emerged as the superior choice in terms of speed and accuracy. However, challenges remain in detecting small objects and addressing class imbalances, as evidenced by YOLOv3’s performance on the BDD100k dataset.

While YOLOv3 showed a promising mAP of 15.44 overall, it particularly excelled with a 32.96 AP in detecting cars, albeit with difficulties in recognizing distant vehicles and those obscured by headlights. Pedestrian detection also proved problematic, especially at farther ranges. To improve upon these issues, data augmentation strategies were proposed, with further suggestions to utilize more balanced datasets like KITTI for training.

Read the Full Paper Read the Literature Review

Post-Project: Putting it on the Edge

After submitting the paper, I decided to explore the model a little bit more and see if I could run it in real-time on cheap and affordable hardware. One way to detect known objects in a video is to stream it to the cloud, where some powerful hardware runs inference (i.e. detect any objects present) and informs the device of the results. While that works amazingly for certain applications, sending data to a server requires stable and secure access to the network and involves a round-trip delay, which gets in the way when working with real-time data.

Considering these limitations, I am inclined to implement object detection directly on the edge. By doing so, it enables faster processing, reduces latency, and alleviates the need for constant internet connectivity. Running the inference locally on the edge devices not only makes the system more efficient for real-time applications but also enhances privacy and security, as sensitive data does not need to be transmitted over the network.

Object Dectection using CNNs

To clarify what I am trying to do, an object detector scans an image to identify all objects present, providing their labels and precise positions within the image. It essentially draws a bounding rectangle around each identified object.

Source: medium.com/zylapp
Objection Detection: Identify objects in an image and draw a rectangle around their locations in said image. Source: medium.com/zylapp

Now, let’s take a brief look at how object detectors work.

Source: medium.com/@jonathan_hui
SSD Network Architecture. Source: medium.com/@jonathan_hui

Without delving into the specifics (for which you can refer to my literature review), SSD comprises of two main components:

The Setup

Performance Benchmarks
Performance Benchmarks of State Of The Art CNNs Running On Different Hardwares. Numbers are taken from nVIDIA and Google Coral websites

As of the time of writing, the two best algorithms used for object detection on the edge are the SSD and Tiny YOLO. In the table above, we compare the performance of each algorithm on different microprocessors. Keeping in mind that these numbers have been published by the respective companies with very efficient code, the average developer picking up these algorithms may only get about 60% of the frame rate.

After much deliberation, I’ve decided to settle with the newest Raspberry Pi 4, which came out in June this year, as the main board for running object detection algorithms. The Raspberry Pi features a quad core CPU running at 1.5GHZ, 4GB RAM, and only consumes approximately 7.6W in power. It also comes at a very affordable price tag of $75. (19GFLOPS)

Testing Hardware
Technical Details Slides are my own.

However, the Raspberry Pi alone doesn’t pack the punch necessary for heavy-duty computing, so I’ve thrown in a couple of Coral Edge TPUs. These are basically hardware tailored to chew through tensors and matrix multiplications like nobody’s business. Architecture-wise, they’re a match made in heaven with the Raspberry Pi, sipping just about 500mA of power, and they won’t break the bank either, setting you back just $100.

Alright, to get this show on the road, we will need eyes in the field — enter the Raspberry Pi camera. Picked it for a no-brainer reason: it’s literally made for the Raspberry Pi. Plus, this camera is purported to capture the action at 90FPS at 640x480 resolution.

The Final Setup.
The Final Setup.

What was left to do was to create a virtual environment and install Python3, TensorFlow, and OpenCV to run my model on the Raspberry Pi.

Results

In the end, my Cigarette Lighter Adapter was defective and couldn’t delivery nearly enough power to the entire setup so I unplugged one of the Coral Edge TPUs. Despite that, the setup still choked on power and couldn’t function optimally but I didn’t know until after I watched the screen recording.

As I drove around downtown Montreal, I remotely accessed the Raspberry Pi from my MacBook using RealVNC, which was the only way I could control the Raspbery Pi. To capture the whole video, I recorded my MacBook’s screen, which is why the final video looks choppy — RealVNC dropped too many frames.

Car 1
Figure 1: Daytime 1.
Car 2
Figure 2: Daytime 2.
Car 3
Figure 3: Nighttime.
Car 4
Figure 4: Blurry/Out of focus.

Read the Full Paper Read the Literature Review