Computer Visionml~5 mins

Custom object detection dataset in Computer Vision

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

We create custom object detection datasets to teach computers how to find specific things in images. This helps computers recognize objects that matter to us.

You want a computer to find your own special objects, like your pet or your car.

You have images from your workplace and want to detect tools or products.

You want to build a smart camera that spots certain items in real time.

You need to train a model to detect objects that are not in public datasets.

You want to improve safety by detecting hazards in images from your environment.

Syntax

Computer Vision

Dataset structure:
- Images folder: contains all images (e.g., image1.jpg, image2.jpg)
- Annotations folder: contains label files (e.g., image1.txt, image2.txt)

Annotation format (YOLO style example):
<class_id> <x_center> <y_center> <width> <height>

All values are normalized between 0 and 1 relative to image size.

Annotations must match image filenames exactly except for extension.

Coordinates are usually normalized to keep dataset flexible for different image sizes.

Examples

This means image1.jpg has two objects: class 0 at center (50%,50%) with width 20% and height 30%, and class 1 at center (70%,80%) with width and height 10%.

Computer Vision

Annotations for image1.txt:
0 0.5 0.5 0.2 0.3
1 0.7 0.8 0.1 0.1

Images and labels are stored separately but linked by filename.

Computer Vision

Folder structure:
/images/image1.jpg
/images/image2.jpg
/labels/image1.txt
/labels/image2.txt

Sample Model

This code loads images and their labels from folders. It reads bounding boxes in normalized form and converts them to pixel values. Then it prints the objects found in each image.

Computer Vision

import os
from PIL import Image

def load_dataset(image_dir, label_dir):
    dataset = []
    for filename in os.listdir(image_dir):
        if filename.endswith('.jpg') or filename.endswith('.png'):
            image_path = os.path.join(image_dir, filename)
            label_path = os.path.join(label_dir, filename.rsplit('.',1)[0] + '.txt')
            if os.path.exists(label_path):
                with open(label_path, 'r') as f:
                    labels = [line.strip().split() for line in f.readlines()]
                image = Image.open(image_path)
                width, height = image.size
                objects = []
                for label in labels:
                    class_id, x_c, y_c, w, h = label
                    x_c, y_c, w, h = map(float, (x_c, y_c, w, h))
                    # Convert normalized to pixel coordinates
                    x_center = x_c * width
                    y_center = y_c * height
                    box_width = w * width
                    box_height = h * height
                    objects.append({
                        'class_id': int(class_id),
                        'bbox': [x_center, y_center, box_width, box_height]
                    })
                dataset.append({'image': filename, 'objects': objects})
    return dataset

# Example usage
image_dir = 'images'
label_dir = 'labels'
dataset = load_dataset(image_dir, label_dir)
for data in dataset:
    print(f"Image: {data['image']}")
    for obj in data['objects']:
        print(f" Class {obj['class_id']} at {obj['bbox']}")

OutputSuccess

Important Notes

Make sure your annotation format matches the model you plan to use.

Keep your dataset balanced with enough examples for each class.

Use tools like labelImg or makesense.ai to create annotations easily.

Summary

Custom datasets help computers learn to find your special objects.

Annotations link images to object locations using normalized coordinates.

Organize images and labels carefully for smooth training.