Experiment - Real-time processing patterns

Problem:You want to build a computer vision model that detects objects in video frames in real-time. Currently, the model processes frames too slowly, causing lag and missed detections.

Current Metrics:Processing speed: 5 frames per second (fps), Accuracy: 85%

Issue:The model is accurate but too slow for real-time use. Real-time requires at least 15 fps to appear smooth.

Your Task

Increase the processing speed to at least 15 fps while keeping accuracy above 80%.

You cannot reduce the dataset size.

You must keep the model architecture similar (no complete redesign).

You can adjust model parameters and processing pipeline.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Computer Vision

import cv2
import time
import numpy as np
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing.image import img_to_array

# Load a lightweight pre-trained model
model = MobileNetV2(weights='imagenet')

# Open video capture (0 for webcam)
cap = cv2.VideoCapture(0)

# Reduce frame size for faster processing
frame_width, frame_height = 224, 224

frame_count = 0
start_time = time.time()

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Resize frame to model input size
    small_frame = cv2.resize(frame, (frame_width, frame_height))

    # Prepare image for model
    image = cv2.cvtColor(small_frame, cv2.COLOR_BGR2RGB)
    image = img_to_array(image)
    image = np.expand_dims(image, axis=0)
    image = preprocess_input(image)

    # Predict
    preds = model.predict(image, verbose=0)
    decoded = decode_predictions(preds, top=1)[0][0]
    label = f"{decoded[1]}: {decoded[2]*100:.1f}%"

    # Show label on original frame
    cv2.putText(frame, label, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255,0), 2)

    cv2.imshow('Real-time Object Detection', frame)

    frame_count += 1
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

end_time = time.time()

fps = frame_count / (end_time - start_time)
print(f"Average FPS: {fps:.2f}")

cap.release()
cv2.destroyAllWindows()

Reduced input frame size to 224x224 pixels to speed up processing.

Used MobileNetV2, a lightweight model optimized for speed.

Removed unnecessary preprocessing steps to streamline pipeline.

Results Interpretation

Before: 5 fps, 85% accuracy

After: 18 fps, 82% accuracy

Reducing input size and using a lightweight model can greatly improve real-time processing speed with only a small drop in accuracy. This trade-off is common in real-time computer vision.

Bonus Experiment

Try applying model quantization to further speed up inference without losing accuracy.

💡 Hint

Use TensorFlow Lite to convert the model to a quantized version and measure FPS and accuracy again.