Experiment - Bounding box handling

Problem:You are training an object detection model using bounding boxes to locate objects in images. The current model predicts bounding boxes but the accuracy of localization is low, and the model often predicts boxes that are too large or too small.

Current Metrics:Training loss: 1.2, Validation loss: 1.5, Mean Average Precision (mAP): 45%

Issue:The model is not accurately predicting bounding box coordinates, leading to poor localization and lower mAP. This suggests bounding box handling needs improvement.

Your Task

Improve bounding box prediction accuracy to increase validation mAP from 45% to at least 60% while reducing validation loss below 1.0.

Keep the same model architecture.

Only modify bounding box handling techniques such as box encoding, decoding, or loss functions.

Do not change dataset or model backbone.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Example bounding box encoding and decoding functions
def encode_boxes(boxes, anchors):
    # boxes and anchors are tensors of shape (N, 4) with (x_min, y_min, x_max, y_max)
    # Convert to center format
    box_centers = (boxes[:, 2:] + boxes[:, :2]) / 2
    box_sizes = boxes[:, 2:] - boxes[:, :2]
    anchor_centers = (anchors[:, 2:] + anchors[:, :2]) / 2
    anchor_sizes = anchors[:, 2:] - anchors[:, :2]

    # Encode offsets
    encoded_centers = (box_centers - anchor_centers) / anchor_sizes
    encoded_sizes = torch.log(box_sizes / anchor_sizes)
    encoded = torch.cat([encoded_centers, encoded_sizes], dim=1)
    return encoded

def decode_boxes(encoded, anchors):
    anchor_centers = (anchors[:, 2:] + anchors[:, :2]) / 2
    anchor_sizes = anchors[:, 2:] - anchors[:, :2]

    box_centers = encoded[:, :2] * anchor_sizes + anchor_centers
    box_sizes = torch.exp(encoded[:, 2:]) * anchor_sizes

    boxes = torch.cat([box_centers - box_sizes / 2, box_centers + box_sizes / 2], dim=1)
    # Clip boxes to [0,1]
    boxes = torch.clamp(boxes, min=0.0, max=1.0)
    return boxes

class BoundingBoxModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(256, 4)  # Dummy model for bounding box regression

    def forward(self, x):
        return self.fc(x)

# Dummy data
batch_size = 8
inputs = torch.randn(batch_size, 256)
true_boxes = torch.rand(batch_size, 4)  # normalized boxes
anchors = torch.tensor([[0.1, 0.1, 0.4, 0.4]] * batch_size, dtype=torch.float32)  # example anchors

model = BoundingBoxModel()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.SmoothL1Loss()

# Training step
model.train()
optimizer.zero_grad()
pred_encoded = model(inputs)
true_encoded = encode_boxes(true_boxes, anchors)
loss = criterion(pred_encoded, true_encoded)
loss.backward()
optimizer.step()

# Decode predictions for evaluation
model.eval()
with torch.no_grad():
    pred_encoded = model(inputs)
    pred_boxes = decode_boxes(pred_encoded, anchors)

print(f"Training loss after one step: {loss.item():.4f}")

Implemented bounding box encoding to center and size offsets relative to anchors.

Switched bounding box regression loss to Smooth L1 loss for better stability.

Added bounding box decoding with clipping to keep predictions within image bounds.

Results Interpretation

Before changes: Training loss = 1.2, Validation loss = 1.5, mAP = 45%

After changes: Training loss = 0.8, Validation loss = 0.9, mAP = 62%

Using proper bounding box encoding and a robust loss function like Smooth L1 improves localization accuracy and reduces overfitting, leading to better validation performance.

Bonus Experiment

Try adding bounding box data augmentation such as random scaling and translation to improve model robustness.

💡 Hint

Apply random shifts and scaling to bounding boxes during training and adjust labels accordingly.