Built-in datasets help you quickly get real data to practice machine learning without searching or downloading manually.
0
0
Built-in datasets (torchvision.datasets) in PyTorch
Introduction
You want to learn image classification with common datasets like MNIST or CIFAR-10.
You need a quick dataset to test your model code.
You want to compare your model with others using standard datasets.
You want to avoid spending time on data cleaning and focus on model building.
Syntax
PyTorch
from torchvision import datasets # Load MNIST dataset mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=None) # Load CIFAR-10 dataset cifar10_train = datasets.CIFAR10(root='./data', train=True, download=True, transform=None)
root is the folder where data will be saved or loaded from.
train=True loads training data; train=False loads test data.
Examples
Load the MNIST test dataset.
PyTorch
from torchvision import datasets mnist_test = datasets.MNIST(root='./data', train=False, download=True)
Load CIFAR-10 training data and convert images to tensors for PyTorch.
PyTorch
from torchvision import datasets, transforms transform = transforms.ToTensor() cifar10_train = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
Load FashionMNIST training dataset, which has clothes images.
PyTorch
from torchvision import datasets fashion_train = datasets.FashionMNIST(root='./data', train=True, download=True)
Sample Model
This code loads the MNIST training data, converts images to tensors, and prints the shape of one batch of images and labels along with the first 10 labels.
PyTorch
import torch from torchvision import datasets, transforms from torch.utils.data import DataLoader # Define a transform to convert images to tensors transform = transforms.ToTensor() # Load MNIST training dataset mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform) # Create a data loader to iterate over dataset train_loader = DataLoader(mnist_train, batch_size=64, shuffle=True) # Get one batch of images and labels images, labels = next(iter(train_loader)) print(f'Batch image tensor shape: {images.shape}') print(f'Batch label tensor shape: {labels.shape}') print(f'First 10 labels in batch: {labels[:10].tolist()}')
OutputSuccess
Important Notes
Built-in datasets automatically download data if not found locally.
Transforms help convert raw data into a format your model can use.
Use DataLoader to easily batch and shuffle data for training.
Summary
Built-in datasets in torchvision provide easy access to popular image datasets.
They save time by handling download and loading for you.
Use transforms and DataLoader to prepare data for your model.