PyTorchml~15 mins

DataLoader basics in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - DataLoader basics

Problem:You want to load and batch your dataset efficiently for training a neural network using PyTorch.

Current Metrics:N/A - currently loading data manually without batching or shuffling.

Issue:Manual data loading is slow and error-prone. No batching or shuffling leads to inefficient training.

Your Task

Use PyTorch DataLoader to load data in batches with shuffling to improve training efficiency.

Use the provided simple dataset (a list of numbers).

Batch size must be 4.

Enable shuffling of data.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
from torch.utils.data import TensorDataset, DataLoader

# Create a simple dataset of numbers 0 to 19
data = torch.arange(20)

# Wrap data in TensorDataset (no labels needed here)
dataset = TensorDataset(data)

# Create DataLoader with batch size 4 and shuffle enabled
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

# Iterate over DataLoader and print batches
for batch_idx, (batch_data,) in enumerate(dataloader):
    print(f"Batch {batch_idx + 1}: {batch_data.tolist()}")

Wrapped raw data in TensorDataset to make it compatible with DataLoader.

Created DataLoader with batch_size=4 and shuffle=True to load data in batches and shuffle each epoch.

Used a loop to iterate over DataLoader and print batches to verify correct batching and shuffling.

Results Interpretation

Before: Data loaded manually one by one, no batching or shuffling.

After: DataLoader loads data in shuffled batches of 4, improving efficiency and randomness.

Using DataLoader in PyTorch helps load data efficiently in batches and shuffle it, which is important for faster and better training of models.

Bonus Experiment

Try using DataLoader with a custom dataset class that returns both features and labels.

💡 Hint

Create a class inheriting from torch.utils.data.Dataset and implement __len__ and __getitem__ methods.