0
0
PyTorchml~5 mins

Gradient clipping in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is gradient clipping in machine learning?
Gradient clipping is a technique to limit or "clip" the gradients during training to prevent them from becoming too large, which helps avoid unstable updates and exploding gradients.
Click to reveal answer
beginner
Why do exploding gradients cause problems during training?
Exploding gradients cause very large updates to model weights, which can make the training unstable and cause the model to fail to learn properly.
Click to reveal answer
intermediate
How does PyTorch implement gradient clipping?
PyTorch provides functions like torch.nn.utils.clip_grad_norm_ and torch.nn.utils.clip_grad_value_ to clip gradients by norm or by value before the optimizer updates the model weights.
Click to reveal answer
intermediate
What is the difference between clipping gradients by norm and by value?
Clipping by norm scales all gradients so their total length (norm) does not exceed a threshold, while clipping by value limits each individual gradient element to a maximum absolute value.
Click to reveal answer
beginner
Show a simple PyTorch code snippet to clip gradients by norm.
After computing loss.backward(), use torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) before optimizer.step() to clip gradients with max norm 1.0.
Click to reveal answer
What problem does gradient clipping mainly solve?
AExploding gradients
BVanishing gradients
COverfitting
DUnderfitting
Which PyTorch function clips gradients by their norm?
Atorch.nn.utils.clip_grad_value_
Btorch.clip_gradients
Ctorch.nn.utils.clip_grad_norm_
Dtorch.gradient_clip
When should gradient clipping be applied during training?
ABefore model initialization
BBefore loss.backward()
CAfter optimizer.step()
DAfter loss.backward() and before optimizer.step()
Clipping gradients by value means:
ALimiting each gradient element to a max absolute value
BScaling all gradients to have a fixed norm
CSetting all gradients to zero
DIncreasing gradient values
What happens if gradients are not clipped and explode?
AModel trains faster
BTraining becomes unstable and may fail
CModel accuracy improves automatically
DNothing changes
Explain in your own words what gradient clipping is and why it is useful.
Think about what happens when gradients get too big during training.
You got /3 concepts.
    Describe how to apply gradient clipping in a PyTorch training loop.
    Remember the order of operations in training.
    You got /3 concepts.