0
0
Computer-visionHow-ToBeginner · 4 min read

Real Time Object Detection in Computer Vision: How to Implement

To do real time object detection in computer vision, use a pre-trained YOLO or SSD model with a video stream input, processing each frame to detect objects quickly. Libraries like OpenCV help capture video and display detections in real time with minimal delay.
📐

Syntax

Real time object detection typically involves these steps:

  • Load Model: Load a pre-trained detection model like YOLO or SSD.
  • Capture Video: Use a video source such as a webcam.
  • Process Frames: For each video frame, run the model to detect objects.
  • Display Results: Draw bounding boxes and labels on detected objects and show the frame.

This cycle repeats continuously for real time detection.

python
import cv2
import numpy as np

# Load YOLO model
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')

# Load class names
with open('coco.names', 'r') as f:
    classes = [line.strip() for line in f.readlines()]

# Set up video capture
cap = cv2.VideoCapture(0)  # 0 for webcam

while True:
    ret, frame = cap.read()
    if not ret:
        break

    height, width, _ = frame.shape
    blob = cv2.dnn.blobFromImage(frame, 1/255, (416, 416), swapRB=True, crop=False)
    net.setInput(blob)

    # Get output layer names
    layer_names = net.getUnconnectedOutLayersNames()
    outputs = net.forward(layer_names)

    # Process outputs to detect objects (simplified)
    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)

                x = int(center_x - w / 2)
                y = int(center_y - h / 2)

                cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
                cv2.putText(frame, classes[class_id], (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2)

    cv2.imshow('Real Time Object Detection', frame)
    if cv2.waitKey(1) & 0xFF == 27:  # ESC key to exit
        break

cap.release()
cv2.destroyAllWindows()
💻

Example

This example shows how to detect objects from your webcam in real time using YOLOv3 and OpenCV. It loads the model, captures video frames, detects objects, and draws boxes with labels.

python
import cv2
import numpy as np

# Load YOLOv3 model and config files
net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg')

# Load COCO class labels
with open('coco.names', 'r') as f:
    classes = [line.strip() for line in f.readlines()]

# Initialize webcam
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    height, width, _ = frame.shape
    blob = cv2.dnn.blobFromImage(frame, 1/255, (416, 416), swapRB=True, crop=False)
    net.setInput(blob)

    output_layers = net.getUnconnectedOutLayersNames()
    detections = net.forward(output_layers)

    boxes = []
    confidences = []
    class_ids = []

    for output in detections:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)

                x = int(center_x - w / 2)
                y = int(center_y - h / 2)

                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    # Non-max suppression to remove overlapping boxes
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

    for i in indexes.flatten():
        x, y, w, h = boxes[i]
        label = classes[class_ids[i]]
        confidence = confidences[i]
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
        cv2.putText(frame, f'{label} {confidence:.2f}', (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2)

    cv2.imshow('YOLO Real Time Detection', frame)

    if cv2.waitKey(1) & 0xFF == 27:  # ESC key
        break

cap.release()
cv2.destroyAllWindows()
Output
A window opens showing live webcam feed with green boxes and labels around detected objects in real time.
⚠️

Common Pitfalls

Some common mistakes when doing real time object detection include:

  • Using large models: Heavy models cause slow detection and lag.
  • Not resizing frames: Feeding full-size frames slows processing; resizing to model input size is needed.
  • Ignoring non-max suppression: Without it, multiple overlapping boxes appear for the same object.
  • Not handling video stream properly: Forgetting to release the camera or destroy windows causes resource leaks.

Always optimize model size and frame processing for smooth real time performance.

python
import cv2

# Wrong: No frame resizing, no non-max suppression
cap = cv2.VideoCapture(0)
while True:
    ret, frame = cap.read()
    if not ret:
        break
    # Directly use frame without resizing or blob conversion
    cv2.imshow('Wrong', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break
cap.release()
cv2.destroyAllWindows()

# Right: Resize frame and use non-max suppression (see previous example)
📊

Quick Reference

Tips for real time object detection:

  • Use lightweight models like YOLOv3-tiny or MobileNet SSD for faster speed.
  • Resize input frames to model's expected size (e.g., 416x416 for YOLO).
  • Apply non-max suppression to reduce duplicate boxes.
  • Use GPU acceleration if available for better performance.
  • Release video capture and close windows properly to avoid crashes.

Key Takeaways

Use pre-trained models like YOLO or SSD with OpenCV for real time object detection.
Resize video frames and apply non-max suppression for accurate and fast detection.
Optimize model choice and use GPU acceleration to reduce lag in real time processing.
Always release camera and close windows to manage resources properly.