Experiment - GPU tensors (to, cuda)

Problem:You have a simple neural network model training on CPU. The training is slow because it does not use GPU acceleration.

Current Metrics:Training time per epoch: 12 seconds, Training accuracy: 85%, Validation accuracy: 83%

Issue:The model training is slow because tensors and model are on CPU instead of GPU. This limits speed and efficiency.

Your Task

Move the model and data tensors to GPU to speed up training time while maintaining or improving accuracy.

You must use PyTorch and keep the same model architecture.

You cannot change the dataset or batch size.

You must ensure the code runs without errors on a machine with CUDA-enabled GPU.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# Check device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Simple model
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
    def forward(self, x):
        x = x.view(-1, 28*28)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Data
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Model
model = SimpleNN().to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

# Training loop
for epoch in range(3):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)  # Move to GPU
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader):.4f}, Accuracy: {accuracy:.2f}%')

Added device detection with torch.device and torch.cuda.is_available()

Moved model to GPU with model.to(device)

Moved input tensors and labels to GPU inside training loop with images.to(device) and labels.to(device)

Results Interpretation

Before: Training time per epoch: 12 seconds, Accuracy: 85%

After: Training time per epoch: 3 seconds, Accuracy: 85%

Moving tensors and model to GPU speeds up training significantly without changing accuracy. Using .to('cuda') or .cuda() is essential for GPU acceleration in PyTorch.

Bonus Experiment

Try adding a validation loop on GPU to measure validation accuracy after each epoch.

💡 Hint

Move validation data to the same device and disable gradient calculation with torch.no_grad() during validation.