How to detect objects in video in computer vision

Computer-visionHow-ToBeginner · 4 min read

How to Detect Objects in Video Using Computer Vision

To detect objects in video, use a pretrained object detection model like YOLO or SSD and apply it frame-by-frame on the video stream. Extract each frame, run the model to get bounding boxes and class labels, then display or process the results in real time.

📐

Syntax

Object detection in video typically follows these steps:

Load video: Open the video file or stream.
Read frames: Extract frames one by one.
Detect objects: Run a detection model on each frame.
Display or save: Show frames with detected boxes or save results.

Each step uses specific functions from libraries like OpenCV and deep learning frameworks.

python

import cv2
import torch

# Load video
cap = cv2.VideoCapture('video.mp4')

# Load pretrained model (YOLOv5 example)
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Convert BGR to RGB
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Detect objects
    results = model(rgb_frame)

    # Render results on frame
    annotated_frame = results.render()[0]

    # Show frame
    cv2.imshow('Object Detection', annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Output

A window opens showing the video frames with detected objects highlighted by bounding boxes and labels.

💻

Example

This example uses OpenCV to read a video and a pretrained YOLOv5 model from PyTorch Hub to detect objects in each frame. It displays the video with bounding boxes and labels in real time.

python

import cv2
import torch

# Open video file or camera
cap = cv2.VideoCapture(0)  # Use 0 for webcam or replace with 'video.mp4'

# Load YOLOv5 small model pretrained on COCO dataset
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Convert BGR to RGB
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Run detection
    results = model(rgb_frame)

    # Draw bounding boxes and labels
    annotated_frame = results.render()[0]

    # Show the frame
    cv2.imshow('Video Object Detection', annotated_frame)

    # Press 'q' to quit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Output

A live video window showing detected objects with boxes and labels, updating frame by frame until 'q' is pressed.

⚠️

Common Pitfalls

Slow processing: Running detection on every frame can be slow; consider skipping frames or using faster models.
Incorrect frame format: Models expect images in specific formats; ensure correct color channels (BGR vs RGB).
Resource limits: Video and model processing can use much CPU/GPU; optimize or use hardware acceleration.
Not releasing resources: Always release video capture and close windows to avoid crashes.

python

import cv2
import torch

cap = cv2.VideoCapture('video.mp4')
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Wrong: passing BGR frame directly without conversion
    # results = model(frame)  # May cause inaccurate detection

    # Correct: convert BGR to RGB
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = model(rgb_frame)

    annotated_frame = results.render()[0]
    cv2.imshow('Detection', annotated_frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

📊

Quick Reference

Tips for object detection in video:

Use pretrained models like YOLO, SSD, or Faster R-CNN for quick setup.
Process video frame-by-frame using OpenCV's VideoCapture.
Convert frames to the correct color format before detection.
Optimize speed by resizing frames or skipping frames.
Release resources properly to avoid memory leaks.

✅

Key Takeaways

Use pretrained object detection models to detect objects frame-by-frame in video streams.

Always convert video frames to the correct color format (usually RGB) before detection.

Optimize processing speed by resizing frames or skipping frames if detection is slow.

Release video and window resources properly to prevent crashes or freezes.

Display detection results by drawing bounding boxes and labels on each frame.