How to do object detection python in computer vision

Computer-visionHow-ToBeginner · 4 min read

How to Do Object Detection in Python for Computer Vision

To do object detection in Python for computer vision, you can use libraries like OpenCV with pre-trained models such as YOLO or SSD. These models detect objects by analyzing images and returning bounding boxes with labels and confidence scores.

📐

Syntax

Object detection in Python typically involves loading a pre-trained model, preparing the input image, running the model to detect objects, and then processing the output to draw bounding boxes and labels.

Load model: Load weights and configuration files.
Prepare input: Convert image to required format (e.g., blob).
Run detection: Pass input to the model to get detections.
Process output: Extract bounding boxes, class IDs, and confidence scores.

python

import cv2

# Load pre-trained YOLO model
net = cv2.dnn.readNetFromDarknet('yolov3.cfg', 'yolov3.weights')

# Load image
image = cv2.imread('image.jpg')

# Prepare input blob
blob = cv2.dnn.blobFromImage(image, 1/255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)

# Get output layer names
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

# Run forward pass
outputs = net.forward(output_layers)

# Process outputs to get bounding boxes, confidences, class IDs
# (processing code goes here)

💻

Example

This example shows how to detect objects in an image using OpenCV and YOLOv3. It loads the model, processes the image, detects objects, and draws bounding boxes with labels.

python

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNetFromDarknet('yolov3.cfg', 'yolov3.weights')
classes = []
with open('coco.names', 'r') as f:
    classes = [line.strip() for line in f.readlines()]

# Load image
img = cv2.imread('image.jpg')
height, width, _ = img.shape

# Create blob
blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)

# Get output layers
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

# Forward pass
outs = net.forward(output_layers)

class_ids = []
confidences = []
bboxes = []

# Process detections
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)
            bboxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

# Non-max suppression to remove overlaps
indexes = cv2.dnn.NMSBoxes(bboxes, confidences, 0.5, 0.4)

# Draw bounding boxes
font = cv2.FONT_HERSHEY_PLAIN
for i in indexes.flatten():
    x, y, w, h = bboxes[i]
    label = str(classes[class_ids[i]])
    confidence = confidences[i]
    color = (0, 255, 0)
    cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
    cv2.putText(img, f'{label} {confidence:.2f}', (x, y - 5), font, 1, color, 1)

# Save or show image
cv2.imwrite('output.jpg', img)
print('Object detection complete, output saved as output.jpg')

Output

Object detection complete, output saved as output.jpg

⚠️

Common Pitfalls

Wrong model files: Using incompatible or missing config and weights files causes errors.
Incorrect input size: Not resizing images to model's expected size (e.g., 416x416) reduces accuracy.
Ignoring confidence threshold: Not filtering low-confidence detections leads to many false positives.
Skipping non-max suppression: Overlapping boxes clutter results without this step.
File paths: Incorrect paths to model or image files cause failures.

python

import cv2

# Wrong way: Not resizing image or creating blob properly
image = cv2.imread('image.jpg')
net = cv2.dnn.readNetFromDarknet('yolov3.cfg', 'yolov3.weights')
net.setInput(image)  # Incorrect: should create blob first
outputs = net.forward()  # This will cause error or bad results

# Right way:
blob = cv2.dnn.blobFromImage(image, 1/255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
outputs = net.forward(net.getUnconnectedOutLayersNames())

📊

Quick Reference

Use cv2.dnn.readNetFromDarknet() to load YOLO models.
Prepare input with cv2.dnn.blobFromImage() resizing to 416x416.
Run detection with net.forward() on output layers.
Filter detections by confidence > 0.5.
Apply non-max suppression with cv2.dnn.NMSBoxes() to remove overlaps.
Draw bounding boxes with cv2.rectangle() and labels with cv2.putText().

✅

Key Takeaways

Use pre-trained models like YOLO with OpenCV's DNN module for easy object detection in Python.

Always preprocess images by creating a blob with the correct size and scale before detection.

Filter detections by confidence and apply non-max suppression to get clean results.

Ensure correct paths and compatible model files to avoid errors.

Drawing bounding boxes and labels helps visualize detected objects clearly.