Recall & Review
beginner
What is mixed precision training in deep learning?
Mixed precision training uses both 16-bit and 32-bit floating point numbers during model training to speed up computation and reduce memory use, while keeping model accuracy.
Click to reveal answer
beginner
What does AMP stand for in PyTorch?
AMP stands for Automatic Mixed Precision. It is a PyTorch feature that automatically manages mixed precision training to make it easier and safer.
Click to reveal answer
beginner
Why use mixed precision training? List two benefits.
1. Faster training because 16-bit operations are quicker on modern GPUs.<br>2. Less GPU memory used, allowing bigger models or larger batches.
Click to reveal answer
intermediate
How does PyTorch AMP help prevent underflow or overflow during training?
PyTorch AMP uses a technique called loss scaling. It multiplies the loss by a big number before backpropagation to keep gradients in a good range, then scales them back down.
Click to reveal answer
intermediate
Show a simple PyTorch code snippet to enable AMP during training.
scaler = torch.cuda.amp.GradScaler()
for data, target in dataloader:
optimizer.zero_grad()
with torch.cuda.amp.autocast():
output = model(data)
loss = loss_fn(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
Click to reveal answer
What is the main purpose of using mixed precision training?
✗ Incorrect
Mixed precision training speeds up training and reduces memory by using 16-bit floats where possible.
In PyTorch AMP, what does the GradScaler do?
✗ Incorrect
GradScaler multiplies the loss to keep gradients in a good numeric range during backpropagation.
Which PyTorch context manager is used to enable mixed precision operations?
✗ Incorrect
torch.cuda.amp.autocast() enables automatic mixed precision for operations inside its block.
What data types are mainly used in mixed precision training?
✗ Incorrect
Mixed precision training uses float16 (half precision) and float32 (single precision) floats.
What happens if you do not use loss scaling in mixed precision training?
✗ Incorrect
Without loss scaling, small gradient values can underflow to zero in float16, harming training.
Explain how mixed precision training works and why it is useful.
Think about how using smaller numbers can help speed and memory, but also what problem it causes.
You got /3 concepts.
Describe the steps to enable AMP in a PyTorch training loop.
Focus on the order of operations inside the training loop.
You got /5 concepts.