0
0
PyTorchml~20 mins

Zeroing gradients in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Gradient Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this PyTorch code snippet?

Consider the following PyTorch code that performs a backward pass and then prints the gradient of a tensor. What will be printed?

PyTorch
import torch
x = torch.tensor([2.0], requires_grad=True)
y = x * 3
z = y ** 2
z.backward()
print(x.grad)
x.grad.zero_()
print(x.grad)
Atensor([36.])\ntensor([36.])
Btensor([36.])\ntensor([0.])
Ctensor([6.])\ntensor([0.])
Dtensor([6.])\ntensor([6.])
Attempts:
2 left
💡 Hint

Remember that backward() computes gradients and zero_() sets them to zero in-place.

Model Choice
intermediate
1:30remaining
Which PyTorch method correctly resets gradients before a new training step?

In a typical training loop, which method should be called to clear gradients before computing new ones?

Atorch.no_grad()
Bmodel.zero_grad()
Coptimizer.zero_grad()
Dloss.backward()
Attempts:
2 left
💡 Hint

Think about which object manages the parameters and their gradients during optimization.

Hyperparameter
advanced
1:30remaining
What happens if you forget to zero gradients in a PyTorch training loop?

Consider a training loop where optimizer.zero_grad() is never called. What is the effect on the gradients during training?

AGradients accumulate, causing updates to be larger than intended.
BGradients are reset automatically each step, so no effect.
CTraining will raise a runtime error due to missing zeroing.
DGradients become zero, so model weights do not update.
Attempts:
2 left
💡 Hint

Think about how PyTorch accumulates gradients by default.

🔧 Debug
advanced
2:00remaining
Why does this PyTorch code raise an error when zeroing gradients?

Examine the code below. Why does it raise an error at x.grad.zero_()?

PyTorch
import torch
x = torch.tensor([1.0, 2.0], requires_grad=True)
y = x.sum()
y.backward()
x.grad = None
x.grad.zero_()
Ax.grad is None, so calling zero_() raises AttributeError.
Bx.grad is not a tensor, so zero_() is undefined.
Cx.grad has already been zeroed, so zero_() fails.
Dx.grad is a scalar, zero_() requires a tensor.
Attempts:
2 left
💡 Hint

What happens if you assign None to a variable and then call a method on it?

🧠 Conceptual
expert
2:30remaining
Why is zeroing gradients important in mini-batch training?

In mini-batch gradient descent, why must gradients be zeroed before processing each batch?

ATo save memory by deleting old gradients permanently.
BTo initialize model weights to zero before each batch.
CBecause gradients are automatically reset after optimizer.step().
DTo prevent gradient accumulation from previous batches, ensuring correct updates.
Attempts:
2 left
💡 Hint

Consider how PyTorch handles gradients across multiple backward passes.