We create a custom detection dataset to teach a computer how to find objects in pictures that are special to us. This helps the computer learn from our own photos and labels.
0
0
Custom detection dataset in PyTorch
Introduction
When you have your own photos and want to find specific objects in them.
When public datasets don't have the objects you care about.
When you want to train a model to detect items in a new environment, like your home or workplace.
When you want to improve detection accuracy by using your own labeled images.
When you want to test how well a detection model works on your own data.
Syntax
PyTorch
class CustomDetectionDataset(torch.utils.data.Dataset): def __init__(self, image_paths, annotations, transforms=None): self.image_paths = image_paths self.annotations = annotations self.transforms = transforms def __len__(self): return len(self.image_paths) def __getitem__(self, idx): image = Image.open(self.image_paths[idx]).convert("RGB") boxes = self.annotations[idx]['boxes'] # list of [xmin, ymin, xmax, ymax] labels = self.annotations[idx]['labels'] # list of labels target = {} target['boxes'] = torch.tensor(boxes, dtype=torch.float32) target['labels'] = torch.tensor(labels, dtype=torch.int64) if self.transforms: image, target = self.transforms(image, target) return image, target
The __getitem__ method returns one image and its labels at a time.
Annotations must include bounding boxes and labels for each object.
Examples
Create the dataset with lists of image file paths and their annotations.
PyTorch
dataset = CustomDetectionDataset(image_paths, annotations)
Get the first image and its target data (boxes and labels).
PyTorch
image, target = dataset[0] print(image.size, target)
Add image transformations like resizing or flipping during dataset loading.
PyTorch
from torchvision import transforms transform = transforms.Compose([...]) dataset = CustomDetectionDataset(image_paths, annotations, transforms=transform)
Sample Model
This code creates a simple dataset with two images and their bounding boxes and labels. It then prints the first image type and its target data.
PyTorch
import torch from torch.utils.data import Dataset from PIL import Image class CustomDetectionDataset(Dataset): def __init__(self, image_paths, annotations, transforms=None): self.image_paths = image_paths self.annotations = annotations self.transforms = transforms def __len__(self): return len(self.image_paths) def __getitem__(self, idx): image = Image.open(self.image_paths[idx]).convert("RGB") boxes = self.annotations[idx]['boxes'] labels = self.annotations[idx]['labels'] target = {} target['boxes'] = torch.tensor(boxes, dtype=torch.float32) target['labels'] = torch.tensor(labels, dtype=torch.int64) if self.transforms: image, target = self.transforms(image, target) return image, target # Sample data image_paths = ["image1.jpg", "image2.jpg"] annotations = [ {"boxes": [[10, 20, 50, 60]], "labels": [1]}, {"boxes": [[15, 25, 55, 65], [30, 40, 70, 80]], "labels": [2, 3]} ] # Create dataset dataset = CustomDetectionDataset(image_paths, annotations) # Access first item image, target = dataset[0] print(f"Image type: {type(image)}") print(f"Boxes: {target['boxes']}") print(f"Labels: {target['labels']}")
OutputSuccess
Important Notes
Make sure bounding boxes are in the format [xmin, ymin, xmax, ymax].
Labels should be integers representing object classes.
Use transforms carefully to keep boxes and labels aligned with images.
Summary
A custom detection dataset helps train models on your own images and labels.
It returns images and their bounding boxes with labels for each object.
Transforms can be added to change images and targets during loading.