DataLoader helps you load data in small batches so your computer can train models faster and use less memory.
DataLoader basics in PyTorch
torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, num_workers=0)
dataset: Your data wrapped in a Dataset object.
batch_size: Number of samples per batch.
loader = DataLoader(my_dataset, batch_size=4)loader = DataLoader(my_dataset, batch_size=8, shuffle=True)
loader = DataLoader(my_dataset, batch_size=2, num_workers=2)
This code creates a small dataset of 10 samples, wraps it in a DataLoader with batch size 3 and shuffling enabled. It then prints each batch's features and labels.
import torch from torch.utils.data import DataLoader, TensorDataset # Create sample data tensors features = torch.arange(10).float().reshape(-1, 1) # 10 samples, 1 feature each labels = torch.arange(10).float() # 10 labels # Wrap data in TensorDataset dataset = TensorDataset(features, labels) # Create DataLoader with batch size 3 and shuffle=True loader = DataLoader(dataset, batch_size=3, shuffle=True) # Loop over DataLoader and print batches for batch_idx, (x, y) in enumerate(loader): print(f"Batch {batch_idx + 1}:") print("Features:", x.squeeze().tolist()) print("Labels:", y.tolist()) print()
Setting shuffle=True helps the model learn better by mixing data order each time.
Using num_workers > 0 loads data in parallel, speeding up training on machines with multiple CPU cores.
Batch size affects memory use and training speed; smaller batches use less memory but may train slower.
DataLoader splits your dataset into small batches for easier training.
It can shuffle data and load it in parallel to improve training quality and speed.
Use DataLoader to loop over your data smoothly during model training.