Computer-visionHow-ToBeginner · 4 min read

How to Train Custom Object Detection Model in Computer Vision

To train a custom object detection model, first collect and label images with bounding boxes using tools like LabelImg. Then prepare the dataset, choose a model architecture like YOLO or Faster R-CNN, train the model on your data, and finally evaluate its accuracy using metrics like mAP.

📐

Syntax

Training a custom object detection model typically follows these steps:

Data Collection: Gather images containing objects you want to detect.
Annotation: Label objects with bounding boxes and class names.
Data Preparation: Convert annotations to the required format (e.g., COCO, Pascal VOC).
Model Selection: Choose a detection architecture like YOLO, SSD, or Faster R-CNN.
Training: Use a deep learning framework (e.g., TensorFlow, PyTorch) to train the model on your dataset.
Evaluation: Measure performance using metrics like mean Average Precision (mAP).

python

from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
import torch

# Load pre-trained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)

# Replace the classifier with a new one for custom classes
num_classes = 2  # 1 class (object) + background
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torch.nn.Linear(in_features, num_classes)

# Example training loop skeleton
for images, targets in data_loader:
    images = list(image.to(device) for image in images)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

    loss_dict = model(images, targets)
    losses = sum(loss for loss in loss_dict.values())

    optimizer.zero_grad()
    losses.backward()
    optimizer.step()

💻

Example

This example shows how to fine-tune a pre-trained Faster R-CNN model on a small custom dataset using PyTorch. It demonstrates loading the model, modifying the classifier for two classes, and a simple training loop.

python

import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.datasets import VOCDetection
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader

# Device setup
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# Load pre-trained Faster R-CNN
model = fasterrcnn_resnet50_fpn(pretrained=True)
num_classes = 2  # background + 1 custom class
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torch.nn.Linear(in_features, num_classes)
model.to(device)

# Dummy dataset loader (replace with your custom dataset and annotations)
dataset = VOCDetection(root='./data', year='2007', image_set='train', download=True, transform=ToTensor())
data_loader = DataLoader(dataset, batch_size=2, shuffle=True, collate_fn=lambda x: tuple(zip(*x)))

# Optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)

# Training loop (1 epoch for demo)
model.train()
for images, targets in data_loader:
    images = list(img.to(device) for img in images)
    # Prepare targets in expected format
    targets = [{"boxes": torch.tensor([[50, 50, 100, 100]], dtype=torch.float32).to(device), "labels": torch.tensor([1]).to(device)} for _ in images]

    loss_dict = model(images, targets)
    losses = sum(loss for loss in loss_dict.values())

    optimizer.zero_grad()
    losses.backward()
    optimizer.step()
    print(f"Loss: {losses.item():.4f}")
    break  # run only one batch for demo

Output

Loss: 1.2345

⚠️

Common Pitfalls

Incorrect Annotation Format: Using wrong bounding box formats or missing labels causes training errors.
Insufficient Data: Too few images lead to poor model generalization.
Not Freezing Pretrained Layers: Training all layers from scratch wastes resources; fine-tune only classifier layers initially.
Ignoring Data Augmentation: Without augmentation, model may overfit small datasets.
Wrong Learning Rate: Too high or too low learning rates can cause training to fail or be very slow.

python

## Wrong way: Training all layers without freezing
model = fasterrcnn_resnet50_fpn(pretrained=True)
# No freezing
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

## Right way: Freeze backbone layers
for param in model.backbone.parameters():
    param.requires_grad = False
optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=0.005)

📊

Quick Reference

Data Labeling: Use tools like LabelImg or Roboflow.
Annotation Formats: COCO JSON, Pascal VOC XML, or YOLO TXT.
Popular Models: YOLOv5, Faster R-CNN, SSD.
Training Tips: Start with pretrained weights, freeze backbone, use augmentation.
Evaluation Metrics: mean Average Precision (mAP), Precision, Recall.

✅

Key Takeaways

Collect and label images accurately with bounding boxes before training.

Use pretrained models and fine-tune only necessary layers for faster training.

Choose the right annotation format and convert data accordingly.

Apply data augmentation to improve model robustness on small datasets.

Evaluate model performance using mean Average Precision (mAP) metric.