0
0
PyTorchml~20 mins

Custom detection dataset in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Custom detection dataset
Problem:You want to train an object detection model using PyTorch on your own images and labels. Currently, you do not have a dataset class that loads your images and bounding boxes correctly.
Current Metrics:No model training yet because dataset loading is incomplete.
Issue:Without a proper custom dataset class, the model cannot read images and bounding boxes, so training cannot start.
Your Task
Create a PyTorch custom dataset class that loads images and their bounding boxes with labels correctly, so the model can train.
Use PyTorch Dataset class.
Load images from file paths.
Return image tensors and target dictionaries with boxes and labels.
Do not use external dataset libraries.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
from torch.utils.data import Dataset
from PIL import Image
import os
import torchvision.transforms as T

class CustomDetectionDataset(Dataset):
    def __init__(self, annotations, img_dir, transforms=None):
        """
        annotations: list of dicts, each dict has 'image_id', 'boxes' and 'labels'
        img_dir: directory where images are stored
        transforms: torchvision transforms to apply
        """
        self.annotations = annotations
        self.img_dir = img_dir
        self.transforms = transforms

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, idx):
        ann = self.annotations[idx]
        img_path = os.path.join(self.img_dir, ann['image_id'])
        img = Image.open(img_path).convert("RGB")

        boxes = torch.as_tensor(ann['boxes'], dtype=torch.float32)  # [[xmin, ymin, xmax, ymax], ...]
        labels = torch.as_tensor(ann['labels'], dtype=torch.int64)  # [label1, label2, ...]

        target = {}
        target['boxes'] = boxes
        target['labels'] = labels

        if self.transforms:
            img = self.transforms(img)

        return img, target

# Example usage:
# annotations = [
#     {'image_id': 'img1.jpg', 'boxes': [[10, 20, 50, 60]], 'labels': [1]},
#     {'image_id': 'img2.jpg', 'boxes': [[15, 25, 55, 65], [30, 40, 70, 80]], 'labels': [2, 3]}
# ]
# img_dir = '/path/to/images'
# transforms = T.ToTensor()
# dataset = CustomDetectionDataset(annotations, img_dir, transforms)
# img, target = dataset[0]
# print(img.shape, target)
Created a PyTorch Dataset subclass named CustomDetectionDataset.
Implemented __init__ to accept annotations, image directory, and optional transforms.
Implemented __len__ to return dataset size.
Implemented __getitem__ to load image, convert to RGB, convert boxes and labels to tensors, and apply transforms.
Returned image tensor and target dictionary with 'boxes' and 'labels' keys.
Results Interpretation

Before: No dataset class, so no data loading possible.

After: Dataset class loads images and bounding boxes correctly, enabling model training.

Creating a custom dataset class in PyTorch is essential to load your own images and annotations properly for object detection tasks.
Bonus Experiment
Add data augmentation transforms like random horizontal flip and color jitter to the dataset.
💡 Hint
Use torchvision.transforms.Compose to combine multiple transforms and apply them in the dataset __getitem__ method.