YOLO Object Detection Explained: How It Works and When to Use
object detection method in computer vision that detects multiple objects in an image with a single pass through a neural network. It predicts bounding boxes and class probabilities simultaneously, making it efficient for real-time applications.How It Works
YOLO treats object detection like a single regression problem, directly predicting bounding boxes and class probabilities from the entire image in one go. Imagine looking at a photo once and instantly knowing where and what objects are present, instead of scanning piece by piece.
The image is divided into a grid, and each grid cell predicts a fixed number of bounding boxes with confidence scores. These scores tell how sure the model is that an object is inside the box and how accurate the box is. The model also predicts the class (like person, car, dog) for each box.
This approach is different from older methods that first find regions of interest and then classify them separately. YOLO’s single-step process makes it very fast and suitable for real-time detection, like in video streams or live camera feeds.
Example
This example uses the opencv-python library to load a pre-trained YOLO model and detect objects in an image. It prints detected object names and their confidence scores.
import cv2 # Load YOLO model and config files net = cv2.dnn.readNetFromDarknet('yolov3.cfg', 'yolov3.weights') # Load COCO class labels with open('coco.names', 'r') as f: classes = [line.strip() for line in f.readlines()] # Load image image = cv2.imread('example.jpg') height, width = image.shape[:2] # Create blob from image blob = cv2.dnn.blobFromImage(image, 1/255.0, (416, 416), swapRB=True, crop=False) net.setInput(blob) # Get output layer names layer_names = net.getLayerNames() output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers().flatten()] # Forward pass outputs = net.forward(output_layers) # Process detections class_ids = [] confidences = [] bboxes = [] for output in outputs: for detection in output: scores = detection[5:] class_id = scores.argmax() confidence = scores[class_id] if confidence > 0.5: center_x = int(detection[0] * width) center_y = int(detection[1] * height) w = int(detection[2] * width) h = int(detection[3] * height) x = int(center_x - w / 2) y = int(center_y - h / 2) bboxes.append([x, y, w, h]) confidences.append(float(confidence)) class_ids.append(class_id) # Print detected objects for i in range(len(bboxes)): print(f"Detected {classes[class_ids[i]]} with confidence {confidences[i]:.2f}")
When to Use
YOLO is ideal when you need fast and accurate object detection in real time. For example, it is used in self-driving cars to quickly spot pedestrians and other vehicles, in security cameras to detect intruders, and in mobile apps for augmented reality.
Because YOLO processes the whole image at once, it is much faster than older methods, making it perfect for live video analysis or any application where speed matters. However, it may be less precise on very small objects compared to some slower, more detailed methods.
Key Points
- YOLO predicts bounding boxes and classes in one step, making it very fast.
- It divides the image into a grid and predicts boxes per grid cell.
- Best for real-time applications like video and live camera feeds.
- May trade some accuracy for speed, especially on small objects.
- Widely used in robotics, security, and mobile apps.