Prompt Engineering / GenAIml~20 mins

GPU infrastructure planning in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - GPU infrastructure planning

Problem:You want to train a deep learning model that requires a lot of GPU power. Currently, you have a single GPU setup, but training takes too long and sometimes runs out of memory.

Current Metrics:Training time per epoch: 45 minutes; GPU memory usage: 95%; Validation accuracy: 82%

Issue:The model training is slow and sometimes crashes due to GPU memory limits. This limits experimentation and model improvements.

Your Task

Plan and implement a GPU infrastructure setup that reduces training time per epoch to under 20 minutes and avoids out-of-memory errors, while maintaining or improving validation accuracy above 82%.

You can only use up to 2 GPUs.

You must keep the same model architecture and dataset.

You cannot reduce batch size below 32.

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torch.cuda.amp import GradScaler, autocast

# Define model (example: simple CNN)
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(1, 32, 3, 1)
        self.fc = nn.Linear(5408, 10)
    def forward(self, x):
        x = self.conv(x)
        x = torch.relu(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

# Setup device and model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SimpleCNN()
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)
model.to(device)

# Data loaders
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)

# Optimizer and loss
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Mixed precision scaler
scaler = GradScaler()

# Training loop
model.train()
for epoch in range(1):
    total_loss = 0
    for data, target in train_loader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        with autocast():
            output = model(data)
            loss = criterion(output, target)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        total_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {total_loss/len(train_loader):.4f}')

Used nn.DataParallel to utilize 2 GPUs for parallel training.

Increased batch size to 64 to better use GPU memory across devices.

Implemented mixed precision training with torch.cuda.amp to reduce memory usage and speed up computation.

Corrected the input feature size of the fully connected layer from 21632 to 5408 to match the output of the convolutional layer.

Results Interpretation

Before: Training time 45 min, GPU memory 95%, Accuracy 82%

After: Training time 18 min, GPU memory 80%, Accuracy 83%

Using multiple GPUs with data parallelism and mixed precision training can reduce training time and memory usage, helping avoid crashes and improving efficiency without losing accuracy.

Bonus Experiment

Try using gradient accumulation to simulate larger batch sizes on a single GPU and compare training time and accuracy.

💡 Hint

Accumulate gradients over multiple smaller batches before updating model weights to fit larger effective batch size in limited memory.

Practice

(1/5)

1. Why is it important to plan GPU infrastructure before starting a GenAI project?

easy

A. To reduce the size of the AI model automatically

B. To ensure the GPU has enough memory and speed for the AI model

C. Because GPUs are always cheaper than CPUs

D. To avoid using any GPUs and rely only on CPUs

GPU infrastructure planning in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand GPU role in AI projects

Step 2: Importance of matching GPU specs to model needs

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch GPU memory query syntax

Step 2: Check each option for correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand the code logic

Step 2: Determine output based on GPU memory

Final Answer:

Quick Check:

Solution

Step 1: Check get_device_properties usage

Step 2: Identify the fix

Final Answer:

Quick Check:

Solution

Step 1: Analyze GPU memory requirement vs available hardware

Step 2: Consider solutions for insufficient GPU memory

Final Answer:

Quick Check: