0
0
Computer Visionml~5 mins

Data loading with torchvision in Computer Vision

Choose your learning style9 modes available
Introduction

We use data loading with torchvision to easily get images ready for training AI models. It helps us organize and prepare pictures in a simple way.

When you want to train a model to recognize objects in photos.
When you need to load many images from folders for a project.
When you want to apply simple changes to images before training.
When you want to split images into batches for faster training.
Syntax
Computer Vision
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define image transformations
transform = transforms.Compose([
    transforms.Resize((28, 28)),
    transforms.ToTensor()
])

# Load dataset
dataset = datasets.ImageFolder(root='path_to_images', transform=transform)

# Create data loader
loader = DataLoader(dataset, batch_size=32, shuffle=True)

datasets.ImageFolder loads images from folders where each folder name is a label.

transforms.Compose chains image changes like resizing and converting to numbers.

Examples
This example resizes images to 64x64 pixels and loads them in batches of 16.
Computer Vision
transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor()
])
dataset = datasets.ImageFolder('data/train', transform=transform)
loader = DataLoader(dataset, batch_size=16, shuffle=True)
This loads the CIFAR10 dataset directly with images converted to tensors.
Computer Vision
transform = transforms.ToTensor()
dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
loader = DataLoader(dataset, batch_size=64, shuffle=True)
Sample Model

This program loads the MNIST dataset, resizes images to 28x28, converts them to tensors, and loads them in batches of 32. It prints the shape of one batch and the first 5 labels.

Computer Vision
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define transformations to resize and convert images to tensors
transform = transforms.Compose([
    transforms.Resize((28, 28)),
    transforms.ToTensor()
])

# Load MNIST dataset (handwritten digits)
dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

# Create data loader with batch size 32
loader = DataLoader(dataset, batch_size=32, shuffle=True)

# Get one batch of images and labels
images, labels = next(iter(loader))

print(f'Batch image tensor shape: {images.shape}')
print(f'Batch labels tensor shape: {labels.shape}')
print(f'First 5 labels in batch: {labels[:5].tolist()}')
OutputSuccess
Important Notes

Always use transforms.ToTensor() to convert images to numbers the model can understand.

Shuffling data helps the model learn better by mixing images each time.

Batch size controls how many images the model sees at once; smaller batches use less memory.

Summary

Use torchvision's datasets and DataLoader to easily load and prepare image data.

Apply transforms to resize and convert images before training.

Load data in batches and shuffle to improve training efficiency and quality.