0
0
PyTorchml~5 mins

Mixed precision training (AMP) in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is mixed precision training in deep learning?
Mixed precision training uses both 16-bit and 32-bit floating point numbers during model training to speed up computation and reduce memory use, while keeping model accuracy.
Click to reveal answer
beginner
What does AMP stand for in PyTorch?
AMP stands for Automatic Mixed Precision. It is a PyTorch feature that automatically manages mixed precision training to make it easier and safer.
Click to reveal answer
beginner
Why use mixed precision training? List two benefits.
1. Faster training because 16-bit operations are quicker on modern GPUs.<br>2. Less GPU memory used, allowing bigger models or larger batches.
Click to reveal answer
intermediate
How does PyTorch AMP help prevent underflow or overflow during training?
PyTorch AMP uses a technique called loss scaling. It multiplies the loss by a big number before backpropagation to keep gradients in a good range, then scales them back down.
Click to reveal answer
intermediate
Show a simple PyTorch code snippet to enable AMP during training.
scaler = torch.cuda.amp.GradScaler() for data, target in dataloader: optimizer.zero_grad() with torch.cuda.amp.autocast(): output = model(data) loss = loss_fn(output, target) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update()
Click to reveal answer
What is the main purpose of using mixed precision training?
AMake training more accurate by using 64-bit floats
BIncrease model size without changing speed
CSpeed up training and reduce memory use
DAvoid using GPUs
In PyTorch AMP, what does the GradScaler do?
AScales the model weights
BScales the loss to prevent gradient underflow
CScales the input data
DScales the learning rate
Which PyTorch context manager is used to enable mixed precision operations?
Atorch.cuda.amp.autocast()
Btorch.cuda.amp.scale()
Ctorch.enable_grad()
Dtorch.no_grad()
What data types are mainly used in mixed precision training?
Afloat16 and float32
Bfloat64 and float128
Cint32 and int64
Dint8 and float8
What happens if you do not use loss scaling in mixed precision training?
ATraining will be faster
BMemory usage will increase
CModel accuracy will always improve
DGradients might become zero due to underflow
Explain how mixed precision training works and why it is useful.
Think about how using smaller numbers can help speed and memory, but also what problem it causes.
You got /3 concepts.
    Describe the steps to enable AMP in a PyTorch training loop.
    Focus on the order of operations inside the training loop.
    You got /5 concepts.