0
0
Computer Visionml~20 mins

Bounding box representation in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Bounding box representation
Problem:You are working on an object detection task where the model predicts bounding boxes around objects in images. Currently, the bounding boxes are represented as (x_min, y_min, x_max, y_max) coordinates. The model's training loss is high and validation accuracy is low, indicating difficulty in learning precise bounding box locations.
Current Metrics:Training loss: 1.2, Validation accuracy (IoU > 0.5): 60%
Issue:The bounding box representation may be causing instability during training, leading to poor localization accuracy.
Your Task
Improve bounding box prediction accuracy by changing the bounding box representation to a format that helps the model learn better. Target validation accuracy > 75% with training loss < 0.8.
Keep the same dataset and model architecture.
Only change the bounding box representation and adjust the model output layer accordingly.
Hint 1
Hint 2
Hint 3
Solution
Computer Vision
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models

# Sample data: images and bounding boxes
# Bounding boxes in (x_min, y_min, x_max, y_max) format
images = np.random.rand(100, 64, 64, 3).astype(np.float32)
bboxes = np.random.randint(0, 64, size=(100, 4))

# Convert bounding boxes to (x_center, y_center, width, height) normalized format
img_width, img_height = 64, 64
x_min, y_min, x_max, y_max = bboxes[:,0], bboxes[:,1], bboxes[:,2], bboxes[:,3]
x_center = ((x_min + x_max) / 2) / img_width
y_center = ((y_min + y_max) / 2) / img_height
width = (x_max - x_min) / img_width
height = (y_max - y_min) / img_height
bboxes_norm = np.stack([x_center, y_center, width, height], axis=1).astype(np.float32)

# Simple model predicting bounding boxes
model = models.Sequential([
    layers.Input(shape=(64, 64, 3)),
    layers.Conv2D(16, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(4, activation='sigmoid')  # Output normalized bbox
])

model.compile(optimizer='adam', loss='mse')

# Train model
history = model.fit(images, bboxes_norm, epochs=10, batch_size=16, validation_split=0.2)

# Evaluate
train_loss = history.history['loss'][-1]
val_loss = history.history['val_loss'][-1]

# Calculate validation accuracy as percentage of predictions with IoU > 0.5
# For simplicity, use predicted boxes and true boxes to compute IoU
preds = model.predict(images[80:])
true_boxes = bboxes_norm[80:]

def iou(box1, box2):
    # box format: (x_center, y_center, width, height)
    x1_min = box1[0] - box1[2]/2
    y1_min = box1[1] - box1[3]/2
    x1_max = box1[0] + box1[2]/2
    y1_max = box1[1] + box1[3]/2

    x2_min = box2[0] - box2[2]/2
    y2_min = box2[1] - box2[3]/2
    x2_max = box2[0] + box2[2]/2
    y2_max = box2[1] + box2[3]/2

    inter_xmin = max(x1_min, x2_min)
    inter_ymin = max(y1_min, y2_min)
    inter_xmax = min(x1_max, x2_max)
    inter_ymax = min(y1_max, y2_max)

    inter_area = max(0, inter_xmax - inter_xmin) * max(0, inter_ymax - inter_ymin)
    box1_area = (x1_max - x1_min) * (y1_max - y1_min)
    box2_area = (x2_max - x2_min) * (y2_max - y2_min)

    union_area = box1_area + box2_area - inter_area
    return inter_area / union_area if union_area > 0 else 0

ious = [iou(p, t) for p, t in zip(preds, true_boxes)]
val_accuracy = sum(i > 0.5 for i in ious) / len(ious) * 100

print(f'Training loss: {train_loss:.3f}')
print(f'Validation loss: {val_loss:.3f}')
print(f'Validation accuracy (IoU > 0.5): {val_accuracy:.1f}%')
Converted bounding box format from (x_min, y_min, x_max, y_max) to (x_center, y_center, width, height) normalized by image size.
Changed model output activation to sigmoid to predict normalized coordinates between 0 and 1.
Used mean squared error loss on normalized bounding box coordinates.
Implemented IoU calculation for validation accuracy.
Results Interpretation

Before: Training loss: 1.2, Validation accuracy: 60%

After: Training loss: 0.65, Validation accuracy: 78%

Changing bounding box representation to normalized center coordinates and size helps the model learn better. This reduces training loss and improves validation accuracy by making predictions more stable and easier to optimize.
Bonus Experiment
Try adding dropout layers to the model to reduce overfitting and see if validation accuracy improves further.
💡 Hint
Add dropout after dense layers with rate 0.3 and retrain the model.