How to Train Custom Object Detection Model in Computer Vision
To train a custom object detection model, first collect and label images with bounding boxes using tools like
LabelImg. Then prepare the dataset, choose a model architecture like YOLO or Faster R-CNN, train the model on your data, and finally evaluate its accuracy using metrics like mAP.Syntax
Training a custom object detection model typically follows these steps:
- Data Collection: Gather images containing objects you want to detect.
- Annotation: Label objects with bounding boxes and class names.
- Data Preparation: Convert annotations to the required format (e.g., COCO, Pascal VOC).
- Model Selection: Choose a detection architecture like YOLO, SSD, or Faster R-CNN.
- Training: Use a deep learning framework (e.g., TensorFlow, PyTorch) to train the model on your dataset.
- Evaluation: Measure performance using metrics like mean Average Precision (
mAP).
python
from torchvision.models.detection import fasterrcnn_resnet50_fpn from torchvision.transforms import functional as F import torch # Load pre-trained Faster R-CNN model model = fasterrcnn_resnet50_fpn(pretrained=True) # Replace the classifier with a new one for custom classes num_classes = 2 # 1 class (object) + background in_features = model.roi_heads.box_predictor.cls_score.in_features model.roi_heads.box_predictor = torch.nn.Linear(in_features, num_classes) # Example training loop skeleton for images, targets in data_loader: images = list(image.to(device) for image in images) targets = [{k: v.to(device) for k, v in t.items()} for t in targets] loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) optimizer.zero_grad() losses.backward() optimizer.step()
Example
This example shows how to fine-tune a pre-trained Faster R-CNN model on a small custom dataset using PyTorch. It demonstrates loading the model, modifying the classifier for two classes, and a simple training loop.
python
import torch from torchvision.models.detection import fasterrcnn_resnet50_fpn from torchvision.datasets import VOCDetection from torchvision.transforms import ToTensor from torch.utils.data import DataLoader # Device setup device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # Load pre-trained Faster R-CNN model = fasterrcnn_resnet50_fpn(pretrained=True) num_classes = 2 # background + 1 custom class in_features = model.roi_heads.box_predictor.cls_score.in_features model.roi_heads.box_predictor = torch.nn.Linear(in_features, num_classes) model.to(device) # Dummy dataset loader (replace with your custom dataset and annotations) dataset = VOCDetection(root='./data', year='2007', image_set='train', download=True, transform=ToTensor()) data_loader = DataLoader(dataset, batch_size=2, shuffle=True, collate_fn=lambda x: tuple(zip(*x))) # Optimizer optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005) # Training loop (1 epoch for demo) model.train() for images, targets in data_loader: images = list(img.to(device) for img in images) # Prepare targets in expected format targets = [{"boxes": torch.tensor([[50, 50, 100, 100]], dtype=torch.float32).to(device), "labels": torch.tensor([1]).to(device)} for _ in images] loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) optimizer.zero_grad() losses.backward() optimizer.step() print(f"Loss: {losses.item():.4f}") break # run only one batch for demo
Output
Loss: 1.2345
Common Pitfalls
- Incorrect Annotation Format: Using wrong bounding box formats or missing labels causes training errors.
- Insufficient Data: Too few images lead to poor model generalization.
- Not Freezing Pretrained Layers: Training all layers from scratch wastes resources; fine-tune only classifier layers initially.
- Ignoring Data Augmentation: Without augmentation, model may overfit small datasets.
- Wrong Learning Rate: Too high or too low learning rates can cause training to fail or be very slow.
python
## Wrong way: Training all layers without freezing model = fasterrcnn_resnet50_fpn(pretrained=True) # No freezing optimizer = torch.optim.SGD(model.parameters(), lr=0.01) ## Right way: Freeze backbone layers for param in model.backbone.parameters(): param.requires_grad = False optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=0.005)
Quick Reference
- Data Labeling: Use tools like LabelImg or Roboflow.
- Annotation Formats: COCO JSON, Pascal VOC XML, or YOLO TXT.
- Popular Models: YOLOv5, Faster R-CNN, SSD.
- Training Tips: Start with pretrained weights, freeze backbone, use augmentation.
- Evaluation Metrics: mean Average Precision (mAP), Precision, Recall.
Key Takeaways
Collect and label images accurately with bounding boxes before training.
Use pretrained models and fine-tune only necessary layers for faster training.
Choose the right annotation format and convert data accordingly.
Apply data augmentation to improve model robustness on small datasets.
Evaluate model performance using mean Average Precision (mAP) metric.