0
0
PyTorchml~20 mins

Gradient clipping in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Gradient Clipping Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Why use gradient clipping in training?

Imagine you are training a neural network and notice the training loss suddenly spikes or the model weights become very large. Why would applying gradient clipping help in this situation?

AIt prevents gradients from becoming too large, avoiding unstable updates and exploding gradients.
BIt reduces the model size by pruning neurons with small gradients.
CIt increases the learning rate automatically to speed up training.
DIt normalizes the input data to have zero mean and unit variance.
Attempts:
2 left
💡 Hint

Think about what happens when gradients become very large during backpropagation.

Predict Output
intermediate
2:00remaining
Output of gradient clipping code snippet

What will be the value of clipped_norm after running this PyTorch code?

PyTorch
import torch
from torch.nn.utils import clip_grad_norm_

model_params = [torch.nn.Parameter(torch.tensor([3.0, 4.0], requires_grad=True))]
for p in model_params:
    p.grad = torch.tensor([6.0, 8.0])

clipped_norm = clip_grad_norm_(model_params, max_norm=5.0)
print(round(clipped_norm.item(), 2))
A14.0
B5.0
C1.0
D10.0
Attempts:
2 left
💡 Hint

Calculate the norm of the original gradients before clipping.

Model Choice
advanced
1:30remaining
Choosing when to apply gradient clipping

You are training two models: a shallow feedforward network and a deep recurrent neural network (RNN). Which model benefits more from gradient clipping and why?

AThe deep RNN, because it has many layers and is prone to exploding gradients during backpropagation through time.
BThe shallow feedforward network, because it has fewer layers and gradients can explode easily.
CBoth models benefit equally from gradient clipping regardless of architecture.
DNeither model benefits from gradient clipping; it is only useful for convolutional networks.
Attempts:
2 left
💡 Hint

Consider which model type is more likely to have exploding gradients.

Hyperparameter
advanced
1:30remaining
Effect of max_norm value in gradient clipping

In PyTorch's clip_grad_norm_, what happens if you set max_norm to a very small value like 0.1 during training?

AGradients will be amplified to speed up training.
BGradients will be ignored and training will proceed without updates.
CGradients will be scaled down heavily, possibly slowing or stopping learning.
DThe model will automatically increase the learning rate to compensate.
Attempts:
2 left
💡 Hint

Think about what happens when gradients are clipped to a very small norm.

🔧 Debug
expert
2:30remaining
Identifying error in gradient clipping usage

Consider this PyTorch training loop snippet. What error will it raise and why?

PyTorch
import torch
from torch.nn.utils import clip_grad_norm_

model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

inputs = torch.tensor([[1.0, 2.0]])
targets = torch.tensor([[1.0]])

optimizer.zero_grad()
outputs = model(inputs)
loss = torch.nn.functional.mse_loss(outputs, targets)
loss.backward()

clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()

# Next iteration without zero_grad
outputs = model(inputs)
loss = torch.nn.functional.mse_loss(outputs, targets)
loss.backward()
clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
AValueError: max_norm must be positive.
BNo error; code runs fine.
CTypeError: clip_grad_norm_ expects a list of tensors.
DRuntimeError: Trying to backward through the graph a second time without retaining it.
Attempts:
2 left
💡 Hint

Look at the second backward call without clearing gradients.