YOLO Algorithm (You Only Look Once)

The YOLO (You Only Look Once) algorithm is one of the most significant breakthroughs in real-time object detection. Unlike traditional object detection models that use multiple region proposals, YOLO treats detection as a single regression problem. It predicts bounding boxes and class probabilities simultaneously, making it much faster and efficient for real-time applications.

How YOLO Differs from R-CNN Approaches

Traditional object detection methods, such as R-CNN, Fast R-CNN, and Faster R-CNN, rely on region proposal networks to identify potential object locations before classification. This two-step process results in slower inference speeds. YOLO, on the other hand, divides the image into a grid and predicts bounding boxes and class probabilities for each grid cell in a single forward pass. This significantly reduces computation time, enabling real-time object detection.

YOLO Architecture and Grid-Based Predictions

YOLO splits an input image into an S × S grid. Each grid cell is responsible for detecting objects whose center falls within that cell. The model predicts bounding box coordinates (x, y, width, height) along with object confidence scores and class probabilities. Instead of scanning the image multiple times, YOLO processes the entire image in one pass through the neural network, making it highly efficient.

Loss Function and Class Confidence Scores

YOLO uses a custom loss function that balances:

Localization loss: measures the accuracy of bounding box predictions;
Confidence loss: ensures that predictions correctly indicate whether an object is present;
Classification loss: determines how well the predicted class matches the ground truth.

To enhance accuracy, YOLO applies anchor boxes and non-max suppression (NMS) to filter overlapping detections.

Advantages of YOLO: Speed vs. Accuracy Trade-Off

The main advantage of YOLO is speed. Since detection occurs in a single pass, YOLO is much faster than R-CNN-based methods, making it suitable for real-time applications like autonomous driving and surveillance. However, this speed comes at the cost of accuracy, especially for small objects or densely packed scenes. Later versions of YOLO have improved accuracy while maintaining high efficiency.

YOLO Versions: from 2015 to 2025

YOLOv1: introduced the grid-based approach but struggled with small objects;
YOLOv2 & YOLOv3: improved accuracy using anchor boxes and multi-scale detection;
YOLOv4: optimized architecture with CSPDarknet and additional enhancements;
YOLOv5: improved efficiency with PyTorch implementation;
YOLOv6 & YOLOv7: further optimized for speed and accuracy;
YOLOv8: enhanced real-time performance and accuracy;
YOLOv9: outperforms earlier versions with better mean Average Precision (mAP) on COCO dataset;
YOLOv10: it offers lower latency and fewer parameters, making it more efficient;
YOLOv11: this version supports multiple tasks, including segmentation and keypoint detection, achieving significant mAP improvements;
YOLOv12: released in early 2025, enhances real-time detection with attention-centric mechanisms, reducing latency while improving accuracy.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 4. Розділ 7