How to Use ImageFolder in PyTorch for Image Datasets
Use
torchvision.datasets.ImageFolder to load images arranged in folders by class. It automatically assigns labels based on folder names and works well with DataLoader for batching and shuffling.Syntax
The ImageFolder class loads images from a root directory where each subfolder is a class label. It returns a dataset of (image, label) pairs.
Key parts:
root: path to the main folder containing class subfolders.transform: optional image transformations like resizing or normalization.target_transform: optional transformation on labels.
python
torchvision.datasets.ImageFolder(root, transform=None, target_transform=None, loader=None, is_valid_file=None)
Example
This example shows how to load images from a folder, apply basic transformations, and use a DataLoader to iterate batches.
python
import torch from torchvision import datasets, transforms from torch.utils.data import DataLoader # Define image transformations transform = transforms.Compose([ transforms.Resize((64, 64)), transforms.ToTensor(), transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) ]) # Load dataset from folder 'data/train' dataset = datasets.ImageFolder(root='data/train', transform=transform) # Create DataLoader for batching and shuffling dataloader = DataLoader(dataset, batch_size=4, shuffle=True) # Iterate one batch images, labels = next(iter(dataloader)) print(f'Batch image tensor shape: {images.shape}') print(f'Batch labels: {labels}')
Output
Batch image tensor shape: torch.Size([4, 3, 64, 64])
Batch labels: tensor([0, 1, 0, 2])
Common Pitfalls
- Folder structure: Images must be inside subfolders named by class; otherwise, labels won't be assigned correctly.
- Transforms: Forgetting to convert images to tensors or normalize can cause training issues.
- Path errors: Providing wrong root path leads to empty datasets.
- DataLoader shuffle: Not shuffling data during training can reduce model generalization.
python
from torchvision import datasets, transforms # Wrong: No transform, images stay as PIL images wrong_dataset = datasets.ImageFolder(root='data/train') # Right: Apply ToTensor transform correct_dataset = datasets.ImageFolder(root='data/train', transform=transforms.ToTensor())
Quick Reference
Remember these tips when using ImageFolder:
- Organize images in
root/class_name/image.jpgformat. - Use
transformto preprocess images before training. - Use
DataLoaderto batch and shuffle data. - Check dataset size with
len(dataset).
Key Takeaways
ImageFolder loads images from folders named by class and assigns labels automatically.
Always apply transforms like ToTensor and normalization for proper model input.
Use DataLoader with shuffle=True to improve training randomness.
Ensure your dataset folder structure matches ImageFolder expectations.
Check dataset length and sample shapes to verify loading correctness.