Experiment - Custom detection dataset

Problem:You want to train an object detection model using PyTorch on your own images and labels. Currently, you do not have a dataset class that loads your images and bounding boxes correctly.

Current Metrics:No model training yet because dataset loading is incomplete.

Issue:Without a proper custom dataset class, the model cannot read images and bounding boxes, so training cannot start.

Your Task

Create a PyTorch custom dataset class that loads images and their bounding boxes with labels correctly, so the model can train.

Use PyTorch Dataset class.

Load images from file paths.

Return image tensors and target dictionaries with boxes and labels.

Do not use external dataset libraries.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
from torch.utils.data import Dataset
from PIL import Image
import os
import torchvision.transforms as T

class CustomDetectionDataset(Dataset):
    def __init__(self, annotations, img_dir, transforms=None):
        """
        annotations: list of dicts, each dict has 'image_id', 'boxes' and 'labels'
        img_dir: directory where images are stored
        transforms: torchvision transforms to apply
        """
        self.annotations = annotations
        self.img_dir = img_dir
        self.transforms = transforms

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, idx):
        ann = self.annotations[idx]
        img_path = os.path.join(self.img_dir, ann['image_id'])
        img = Image.open(img_path).convert("RGB")

        boxes = torch.as_tensor(ann['boxes'], dtype=torch.float32)  # [[xmin, ymin, xmax, ymax], ...]
        labels = torch.as_tensor(ann['labels'], dtype=torch.int64)  # [label1, label2, ...]

        target = {}
        target['boxes'] = boxes
        target['labels'] = labels

        if self.transforms:
            img = self.transforms(img)

        return img, target

# Example usage:
# annotations = [
#     {'image_id': 'img1.jpg', 'boxes': [[10, 20, 50, 60]], 'labels': [1]},
#     {'image_id': 'img2.jpg', 'boxes': [[15, 25, 55, 65], [30, 40, 70, 80]], 'labels': [2, 3]}
# ]
# img_dir = '/path/to/images'
# transforms = T.ToTensor()
# dataset = CustomDetectionDataset(annotations, img_dir, transforms)
# img, target = dataset[0]
# print(img.shape, target)

Created a PyTorch Dataset subclass named CustomDetectionDataset.

Implemented __init__ to accept annotations, image directory, and optional transforms.

Implemented __len__ to return dataset size.

Implemented __getitem__ to load image, convert to RGB, convert boxes and labels to tensors, and apply transforms.

Returned image tensor and target dictionary with 'boxes' and 'labels' keys.

Results Interpretation

Before: No dataset class, so no data loading possible.

After: Dataset class loads images and bounding boxes correctly, enabling model training.

Creating a custom dataset class in PyTorch is essential to load your own images and annotations properly for object detection tasks.

Bonus Experiment

Add data augmentation transforms like random horizontal flip and color jitter to the dataset.

💡 Hint

Use torchvision.transforms.Compose to combine multiple transforms and apply them in the dataset __getitem__ method.