PyTorchml~20 mins

Num workers for parallel loading in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Num workers for parallel loading

Problem:You are training a PyTorch model on image data. The data loading is slow, causing the GPU to wait idle. Currently, the DataLoader uses num_workers=0, which means data loading is done in the main process.

Current Metrics:Training time per epoch: 120 seconds; GPU utilization: 40%; Validation accuracy: 85%

Issue:Data loading is a bottleneck, slowing training and reducing GPU usage efficiency.

Your Task

Increase the number of workers in the DataLoader to speed up data loading and reduce training time per epoch to under 80 seconds, while maintaining validation accuracy above 85%.

Do not change the model architecture or training hyperparameters.

Only modify the DataLoader's num_workers parameter.

Ensure the code runs without errors on a typical multi-core CPU.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import time

# Define dataset and transform
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

# Create DataLoader with increased num_workers
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)

# Dummy model and optimizer for demonstration
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(28*28, 128),
    torch.nn.ReLU(),
    torch.nn.Linear(128, 10)
)

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# Training loop with timing
model.train()
start_time = time.time()
for images, labels in train_loader:
    images, labels = images.to(device), labels.to(device)
    optimizer.zero_grad()
    outputs = model(images)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
end_time = time.time()

print(f'Training time for one epoch: {end_time - start_time:.2f} seconds')

Increased DataLoader's num_workers from 0 to 4 to enable parallel data loading.

Kept all other code and hyperparameters unchanged.

Results Interpretation

Before: Training time = 120s, GPU utilization = 40%, Validation accuracy = 85%

After: Training time = 75s, GPU utilization = 75%, Validation accuracy = 85%

Increasing num_workers in PyTorch DataLoader allows parallel data loading, reducing training time and improving GPU usage without affecting model accuracy.

Bonus Experiment

Try different num_workers values (e.g., 2, 8, 16) and observe how training time and stability change.

💡 Hint

Too many workers may cause diminishing returns or errors; find the optimal number for your CPU cores.