0
0
PyTorchml~20 mins

Mixed precision training (AMP) in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
AMP Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use mixed precision training (AMP)?

What is the main benefit of using Automatic Mixed Precision (AMP) in PyTorch training?

AIt increases model accuracy by using higher precision calculations everywhere.
BIt reduces memory usage and speeds up training by using float16 where possible.
CIt automatically tunes hyperparameters during training.
DIt converts the model to run on CPU instead of GPU.
Attempts:
2 left
💡 Hint

Think about how using smaller number formats affects speed and memory.

Predict Output
intermediate
2:00remaining
Output of AMP training step snippet

What will be the printed loss value type after this AMP training step?

PyTorch
import torch
from torch.cuda.amp import autocast, GradScaler

model = torch.nn.Linear(2, 1).cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scaler = GradScaler()

inputs = torch.tensor([[1.0, 2.0]], device='cuda')
target = torch.tensor([[1.0]], device='cuda')

optimizer.zero_grad()
with autocast():
    output = model(inputs)
    loss = torch.nn.functional.mse_loss(output, target)
print(type(loss))
A<class 'torch.Tensor'> with dtype=torch.float32
B<class 'torch.Tensor'> with dtype=torch.float16
C<class 'torch.Tensor'> with dtype=torch.float64
DRuntimeError due to dtype mismatch
Attempts:
2 left
💡 Hint

AMP uses float16 for some ops but loss is usually float32 for stability.

Model Choice
advanced
2:00remaining
Choosing model parts for AMP

Which part of a model should NOT be wrapped inside autocast() for AMP training?

AThe forward pass of the neural network layers
BThe loss calculation function
CThe data loading pipeline
DThe optimizer step call
Attempts:
2 left
💡 Hint

Consider which operations benefit from mixed precision and which do not.

Hyperparameter
advanced
2:00remaining
Effect of GradScaler in AMP

What is the role of GradScaler in PyTorch AMP training?

AIt converts all model weights to float16 permanently.
BIt automatically adjusts learning rate during training.
CIt scales the loss to prevent underflow in gradients during backpropagation.
DIt disables gradient computation for certain layers.
Attempts:
2 left
💡 Hint

Think about why gradients might vanish when using float16.

🔧 Debug
expert
3:00remaining
Debugging AMP training error

Given this AMP training snippet, what error will occur and why?

import torch
from torch.cuda.amp import autocast, GradScaler

model = torch.nn.Linear(2, 1).cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scaler = GradScaler()

inputs = torch.tensor([[1.0, 2.0]], device='cuda')
target = torch.tensor([[1.0]], device='cuda')

optimizer.zero_grad()
with autocast():
    output = model(inputs)
    loss = torch.nn.functional.mse_loss(output, target)
loss.backward()
scaler.step(optimizer)
scaler.update()
ARuntimeError: You must call scaler.scale(loss).backward() instead of loss.backward()
BTypeError: optimizer.step() missing required positional argument
CNo error, code runs successfully
DRuntimeError: autocast context must wrap optimizer.step()
Attempts:
2 left
💡 Hint

Check how gradients are computed when using GradScaler.