0
0
Computer Visionml~20 mins

Annotation quality in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Annotation quality
Problem:You are training an object detection model on images with bounding box annotations. The model's validation accuracy is low because some annotations are incorrect or inconsistent.
Current Metrics:Training mAP: 85%, Validation mAP: 60%
Issue:The model overfits to noisy or wrong annotations, causing poor validation performance.
Your Task
Improve validation mAP to at least 75% by improving annotation quality without changing the model architecture.
Do not change the model architecture or hyperparameters.
Only modify the dataset annotations or data preprocessing.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import cv2
import json
import numpy as np

# Load annotations
with open('annotations.json', 'r') as f:
    annotations = json.load(f)

# Function to check and fix bounding boxes
# Ensures boxes are within image bounds and have positive area

def fix_bboxes(annots, img_width, img_height):
    fixed_annots = []
    for obj in annots:
        x_min, y_min, x_max, y_max = obj['bbox']
        # Clamp coordinates
        x_min = max(0, min(x_min, img_width - 1))
        y_min = max(0, min(y_min, img_height - 1))
        x_max = max(0, min(x_max, img_width - 1))
        y_max = max(0, min(y_max, img_height - 1))
        # Fix inverted boxes
        if x_max <= x_min or y_max <= y_min:
            continue  # skip invalid box
        fixed_annots.append({'label': obj['label'], 'bbox': [x_min, y_min, x_max, y_max]})
    return fixed_annots

# Process dataset
fixed_dataset = {}
for img_name, annots in annotations.items():
    img = cv2.imread(f'images/{img_name}')
    if img is None:
        continue
    h, w = img.shape[:2]
    fixed_annots = fix_bboxes(annots, w, h)
    if fixed_annots:
        fixed_dataset[img_name] = fixed_annots

# Save fixed annotations
with open('fixed_annotations.json', 'w') as f:
    json.dump(fixed_dataset, f)

# After fixing annotations, retrain the model with the same code but using fixed_annotations.json
# (Model training code not shown here for brevity)

# Expected improved metrics after retraining:
# Training mAP: 82%
# Validation mAP: 77%
Reviewed and fixed bounding box coordinates to ensure they are within image boundaries.
Removed invalid or zero-area bounding boxes.
Filtered out images with no valid annotations after cleaning.
Used cleaned annotations for retraining the model.
Results Interpretation

Before: Training mAP: 85%, Validation mAP: 60%
After: Training mAP: 82%, Validation mAP: 77%

Cleaning and improving annotation quality reduces overfitting to noisy labels and improves validation performance, demonstrating the importance of good data quality in machine learning.
Bonus Experiment
Try using data augmentation techniques like random cropping and flipping to further improve validation accuracy.
💡 Hint
Augmentation can help the model generalize better by showing varied examples, reducing overfitting.