Challenge - 5 Problems
Backward Pass Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of backward pass gradient accumulation
Consider the following PyTorch code snippet. What will be the value of
x.grad after running all lines?PyTorch
import torch x = torch.tensor(2.0, requires_grad=True) y = x * 3 loss = y ** 2 loss.backward() loss = (x * 4) ** 2 loss.backward() print(x.grad.item())
Attempts:
2 left
💡 Hint
Remember that gradients accumulate by default in PyTorch unless you clear them.
✗ Incorrect
The first backward computes gradient 36, the second computes 64, total 100 accumulated in x.grad.
🧠 Conceptual
intermediate1:30remaining
Effect of calling loss.backward() multiple times without zeroing gradients
In PyTorch, what happens if you call
loss.backward() multiple times on different losses without resetting gradients?Attempts:
2 left
💡 Hint
Think about how PyTorch handles gradients by default.
✗ Incorrect
PyTorch accumulates gradients in the .grad attribute by default. You must manually zero gradients to avoid accumulation.
❓ Hyperparameter
advanced1:30remaining
Choosing correct learning rate with backward pass
You notice your model's loss does not decrease after several backward passes and optimizer steps. Which learning rate adjustment is most likely to help?
Attempts:
2 left
💡 Hint
Think about what happens if the learning rate is too high.
✗ Incorrect
A too high learning rate can cause the model to overshoot minima and not converge. Decreasing it often helps training.
🔧 Debug
advanced1:30remaining
Identifying error in backward pass usage
What error will this PyTorch code raise when calling
loss.backward()?
import torch x = torch.tensor(1.0) y = x * 2 loss = y ** 2 loss.backward()
Attempts:
2 left
💡 Hint
Check if the tensor requires gradients.
✗ Incorrect
The tensor x does not have requires_grad=True, so loss has no gradient function. Calling backward raises RuntimeError.
❓ Model Choice
expert2:30remaining
Choosing model for efficient backward pass in large-scale training
You want to train a very deep neural network with millions of parameters efficiently. Which approach helps reduce memory usage during the backward pass?
Attempts:
2 left
💡 Hint
Think about trading computation for memory during backward.
✗ Incorrect
Gradient checkpointing saves memory by not storing all intermediate activations, recomputing them during backward pass.