What is Custom detection dataset in PyTorch?

PyTorchml~5 mins

Custom detection dataset in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

We create a custom detection dataset to teach a computer how to find objects in pictures that are special to us. This helps the computer learn from our own photos and labels.

When you have your own photos and want to find specific objects in them.

When public datasets don't have the objects you care about.

When you want to train a model to detect items in a new environment, like your home or workplace.

When you want to improve detection accuracy by using your own labeled images.

When you want to test how well a detection model works on your own data.

Syntax

PyTorch

class CustomDetectionDataset(torch.utils.data.Dataset):
    def __init__(self, image_paths, annotations, transforms=None):
        self.image_paths = image_paths
        self.annotations = annotations
        self.transforms = transforms

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image = Image.open(self.image_paths[idx]).convert("RGB")
        boxes = self.annotations[idx]['boxes']  # list of [xmin, ymin, xmax, ymax]
        labels = self.annotations[idx]['labels']  # list of labels

        target = {}
        target['boxes'] = torch.tensor(boxes, dtype=torch.float32)
        target['labels'] = torch.tensor(labels, dtype=torch.int64)

        if self.transforms:
            image, target = self.transforms(image, target)

        return image, target

The __getitem__ method returns one image and its labels at a time.

Annotations must include bounding boxes and labels for each object.

Examples

Create the dataset with lists of image file paths and their annotations.

PyTorch

dataset = CustomDetectionDataset(image_paths, annotations)

Get the first image and its target data (boxes and labels).

PyTorch

image, target = dataset[0]
print(image.size, target)

Add image transformations like resizing or flipping during dataset loading.

PyTorch

from torchvision import transforms
transform = transforms.Compose([...])
dataset = CustomDetectionDataset(image_paths, annotations, transforms=transform)

Sample Model

This code creates a simple dataset with two images and their bounding boxes and labels. It then prints the first image type and its target data.

PyTorch

import torch
from torch.utils.data import Dataset
from PIL import Image

class CustomDetectionDataset(Dataset):
    def __init__(self, image_paths, annotations, transforms=None):
        self.image_paths = image_paths
        self.annotations = annotations
        self.transforms = transforms

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image = Image.open(self.image_paths[idx]).convert("RGB")
        boxes = self.annotations[idx]['boxes']
        labels = self.annotations[idx]['labels']

        target = {}
        target['boxes'] = torch.tensor(boxes, dtype=torch.float32)
        target['labels'] = torch.tensor(labels, dtype=torch.int64)

        if self.transforms:
            image, target = self.transforms(image, target)

        return image, target

# Sample data
image_paths = ["image1.jpg", "image2.jpg"]
annotations = [
    {"boxes": [[10, 20, 50, 60]], "labels": [1]},
    {"boxes": [[15, 25, 55, 65], [30, 40, 70, 80]], "labels": [2, 3]}
]

# Create dataset
dataset = CustomDetectionDataset(image_paths, annotations)

# Access first item
image, target = dataset[0]
print(f"Image type: {type(image)}")
print(f"Boxes: {target['boxes']}")
print(f"Labels: {target['labels']}")

OutputSuccess

Important Notes

Make sure bounding boxes are in the format [xmin, ymin, xmax, ymax].

Labels should be integers representing object classes.

Use transforms carefully to keep boxes and labels aligned with images.

Summary

A custom detection dataset helps train models on your own images and labels.

It returns images and their bounding boxes with labels for each object.

Transforms can be added to change images and targets during loading.