DataLoader helps by splitting data into small groups (batches) and mixing data order (shuffling) so the model learns better and faster.
Why DataLoader handles batching and shuffling in PyTorch
torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True, ...)
batch_size controls how many samples are in one group.
shuffle=True means data order changes every time you start a new pass (epoch).
loader = DataLoader(my_dataset, batch_size=10, shuffle=True)
loader = DataLoader(my_dataset, batch_size=5, shuffle=False)
This code creates a small dataset and uses DataLoader to split it into batches of 3 samples. It also shuffles the data so batches have different samples each time.
import torch from torch.utils.data import DataLoader, TensorDataset # Create sample data: 10 samples, each with 2 features features = torch.arange(20).reshape(10, 2).float() labels = torch.arange(10) # Create dataset dataset = TensorDataset(features, labels) # DataLoader with batch_size=3 and shuffle=True loader = DataLoader(dataset, batch_size=3, shuffle=True) print("Batches:") for batch_idx, (x, y) in enumerate(loader): print(f"Batch {batch_idx + 1} - features:\n{x}\nlabels: {y}\n")
Shuffling helps the model not to learn the order of data, which improves generalization.
Batching speeds up training by processing multiple samples at once instead of one by one.
DataLoader automatically handles the last batch if it has fewer samples than batch_size.
DataLoader splits data into batches to make training faster and easier.
Shuffling changes data order each time to help the model learn better.
Using DataLoader saves you from writing extra code for batching and shuffling.