0
0
PytorchHow-ToBeginner · 3 min read

How to Use ImageFolder in PyTorch for Image Datasets

Use torchvision.datasets.ImageFolder to load images arranged in folders by class. It automatically assigns labels based on folder names and works well with DataLoader for batching and shuffling.
📐

Syntax

The ImageFolder class loads images from a root directory where each subfolder is a class label. It returns a dataset of (image, label) pairs.

Key parts:

  • root: path to the main folder containing class subfolders.
  • transform: optional image transformations like resizing or normalization.
  • target_transform: optional transformation on labels.
python
torchvision.datasets.ImageFolder(root, transform=None, target_transform=None, loader=None, is_valid_file=None)
💻

Example

This example shows how to load images from a folder, apply basic transformations, and use a DataLoader to iterate batches.

python
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define image transformations
transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

# Load dataset from folder 'data/train'
dataset = datasets.ImageFolder(root='data/train', transform=transform)

# Create DataLoader for batching and shuffling
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

# Iterate one batch
images, labels = next(iter(dataloader))
print(f'Batch image tensor shape: {images.shape}')
print(f'Batch labels: {labels}')
Output
Batch image tensor shape: torch.Size([4, 3, 64, 64]) Batch labels: tensor([0, 1, 0, 2])
⚠️

Common Pitfalls

  • Folder structure: Images must be inside subfolders named by class; otherwise, labels won't be assigned correctly.
  • Transforms: Forgetting to convert images to tensors or normalize can cause training issues.
  • Path errors: Providing wrong root path leads to empty datasets.
  • DataLoader shuffle: Not shuffling data during training can reduce model generalization.
python
from torchvision import datasets, transforms

# Wrong: No transform, images stay as PIL images
wrong_dataset = datasets.ImageFolder(root='data/train')

# Right: Apply ToTensor transform
correct_dataset = datasets.ImageFolder(root='data/train', transform=transforms.ToTensor())
📊

Quick Reference

Remember these tips when using ImageFolder:

  • Organize images in root/class_name/image.jpg format.
  • Use transform to preprocess images before training.
  • Use DataLoader to batch and shuffle data.
  • Check dataset size with len(dataset).

Key Takeaways

ImageFolder loads images from folders named by class and assigns labels automatically.
Always apply transforms like ToTensor and normalization for proper model input.
Use DataLoader with shuffle=True to improve training randomness.
Ensure your dataset folder structure matches ImageFolder expectations.
Check dataset length and sample shapes to verify loading correctness.