Real Time Object Detection in Computer Vision: How to Implement
To do real time object detection in computer vision, use a pre-trained
YOLO or SSD model with a video stream input, processing each frame to detect objects quickly. Libraries like OpenCV help capture video and display detections in real time with minimal delay.Syntax
Real time object detection typically involves these steps:
- Load Model: Load a pre-trained detection model like YOLO or SSD.
- Capture Video: Use a video source such as a webcam.
- Process Frames: For each video frame, run the model to detect objects.
- Display Results: Draw bounding boxes and labels on detected objects and show the frame.
This cycle repeats continuously for real time detection.
python
import cv2 import numpy as np # Load YOLO model net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg') # Load class names with open('coco.names', 'r') as f: classes = [line.strip() for line in f.readlines()] # Set up video capture cap = cv2.VideoCapture(0) # 0 for webcam while True: ret, frame = cap.read() if not ret: break height, width, _ = frame.shape blob = cv2.dnn.blobFromImage(frame, 1/255, (416, 416), swapRB=True, crop=False) net.setInput(blob) # Get output layer names layer_names = net.getUnconnectedOutLayersNames() outputs = net.forward(layer_names) # Process outputs to detect objects (simplified) for output in outputs: for detection in output: scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] if confidence > 0.5: center_x = int(detection[0] * width) center_y = int(detection[1] * height) w = int(detection[2] * width) h = int(detection[3] * height) x = int(center_x - w / 2) y = int(center_y - h / 2) cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.putText(frame, classes[class_id], (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2) cv2.imshow('Real Time Object Detection', frame) if cv2.waitKey(1) & 0xFF == 27: # ESC key to exit break cap.release() cv2.destroyAllWindows()
Example
This example shows how to detect objects from your webcam in real time using YOLOv3 and OpenCV. It loads the model, captures video frames, detects objects, and draws boxes with labels.
python
import cv2 import numpy as np # Load YOLOv3 model and config files net = cv2.dnn.readNet('yolov3.weights', 'yolov3.cfg') # Load COCO class labels with open('coco.names', 'r') as f: classes = [line.strip() for line in f.readlines()] # Initialize webcam cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break height, width, _ = frame.shape blob = cv2.dnn.blobFromImage(frame, 1/255, (416, 416), swapRB=True, crop=False) net.setInput(blob) output_layers = net.getUnconnectedOutLayersNames() detections = net.forward(output_layers) boxes = [] confidences = [] class_ids = [] for output in detections: for detection in output: scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] if confidence > 0.5: center_x = int(detection[0] * width) center_y = int(detection[1] * height) w = int(detection[2] * width) h = int(detection[3] * height) x = int(center_x - w / 2) y = int(center_y - h / 2) boxes.append([x, y, w, h]) confidences.append(float(confidence)) class_ids.append(class_id) # Non-max suppression to remove overlapping boxes indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4) for i in indexes.flatten(): x, y, w, h = boxes[i] label = classes[class_ids[i]] confidence = confidences[i] cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.putText(frame, f'{label} {confidence:.2f}', (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2) cv2.imshow('YOLO Real Time Detection', frame) if cv2.waitKey(1) & 0xFF == 27: # ESC key break cap.release() cv2.destroyAllWindows()
Output
A window opens showing live webcam feed with green boxes and labels around detected objects in real time.
Common Pitfalls
Some common mistakes when doing real time object detection include:
- Using large models: Heavy models cause slow detection and lag.
- Not resizing frames: Feeding full-size frames slows processing; resizing to model input size is needed.
- Ignoring non-max suppression: Without it, multiple overlapping boxes appear for the same object.
- Not handling video stream properly: Forgetting to release the camera or destroy windows causes resource leaks.
Always optimize model size and frame processing for smooth real time performance.
python
import cv2 # Wrong: No frame resizing, no non-max suppression cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break # Directly use frame without resizing or blob conversion cv2.imshow('Wrong', frame) if cv2.waitKey(1) & 0xFF == 27: break cap.release() cv2.destroyAllWindows() # Right: Resize frame and use non-max suppression (see previous example)
Quick Reference
Tips for real time object detection:
- Use lightweight models like YOLOv3-tiny or MobileNet SSD for faster speed.
- Resize input frames to model's expected size (e.g., 416x416 for YOLO).
- Apply non-max suppression to reduce duplicate boxes.
- Use GPU acceleration if available for better performance.
- Release video capture and close windows properly to avoid crashes.
Key Takeaways
Use pre-trained models like YOLO or SSD with OpenCV for real time object detection.
Resize video frames and apply non-max suppression for accurate and fast detection.
Optimize model choice and use GPU acceleration to reduce lag in real time processing.
Always release camera and close windows to manage resources properly.