0
0
PyTorchml~20 mins

Custom Dataset class in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Custom Dataset class
Problem:You want to train a PyTorch model using your own data stored as images and labels in separate folders. Currently, you do not have a way to load this data efficiently for training.
Current Metrics:No model training yet because data loading is not implemented.
Issue:Without a custom Dataset class, you cannot load and preprocess your own data properly for PyTorch training.
Your Task
Create a PyTorch Custom Dataset class that loads images and their labels from folders, applies basic transformations, and can be used with a DataLoader for training.
Use PyTorch's Dataset class as the base.
Load images from a given directory and labels from a CSV file.
Apply a simple transformation to convert images to tensors.
Do not use any external dataset libraries like torchvision.datasets.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import os
from PIL import Image
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import pandas as pd

class CustomImageDataset(Dataset):
    def __init__(self, img_dir, labels_csv, transform=None):
        self.img_dir = img_dir
        self.labels_frame = pd.read_csv(labels_csv)
        self.transform = transform

    def __len__(self):
        return len(self.labels_frame)

    def __getitem__(self, idx):
        img_name = os.path.join(self.img_dir, self.labels_frame.iloc[idx, 0])
        image = Image.open(img_name).convert('RGB')
        label = self.labels_frame.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        return image, label

# Example usage:
if __name__ == '__main__':
    transform = transforms.ToTensor()
    dataset = CustomImageDataset(img_dir='images', labels_csv='labels.csv', transform=transform)
    dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

    for images, labels in dataloader:
        print(f'Batch images shape: {images.shape}')
        print(f'Batch labels: {labels}')
        break
Created a CustomImageDataset class inheriting from torch.utils.data.Dataset.
Implemented __init__ to load image directory path and labels from CSV.
Implemented __len__ to return dataset size.
Implemented __getitem__ to load an image and its label, apply transform, and return them.
Added example usage with DataLoader to test batch loading.
Results Interpretation

Before: No data loading, no training possible.

After: Custom Dataset class loads images and labels correctly, enabling model training.

Creating a custom Dataset class in PyTorch allows you to load and preprocess your own data efficiently, which is essential for training models on custom datasets.
Bonus Experiment
Extend the Custom Dataset class to include data augmentation such as random horizontal flips and random crops.
💡 Hint
Use torchvision.transforms.Compose to combine multiple transformations including RandomHorizontalFlip and RandomResizedCrop.