0
0
PyTorchml~5 mins

Backward pass (loss.backward) in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the purpose of the backward pass in PyTorch?
The backward pass computes gradients of the loss with respect to model parameters. It helps the model learn by showing how to adjust weights to reduce errors.
Click to reveal answer
beginner
What does the method loss.backward() do in PyTorch?
It calculates the gradients of the loss tensor with respect to all tensors that have requires_grad=True. These gradients are stored in the .grad attribute of each tensor.
Click to reveal answer
intermediate
Why do we need to call optimizer.zero_grad() before loss.backward()?
Because PyTorch accumulates gradients by default, calling optimizer.zero_grad() clears old gradients. This prevents mixing gradients from multiple backward passes.
Click to reveal answer
beginner
What happens if you forget to call loss.backward() during training?
No gradients will be computed, so the optimizer cannot update the model weights. The model will not learn or improve.
Click to reveal answer
intermediate
How does PyTorch know which operations to track for gradient computation?
PyTorch builds a computation graph dynamically during the forward pass. It tracks operations on tensors with requires_grad=True to compute gradients during the backward pass.
Click to reveal answer
What does loss.backward() compute?
AThe optimizer step
BGradients of loss with respect to model parameters
CThe updated model weights
DThe loss value itself
Before calling loss.backward(), why do we call optimizer.zero_grad()?
ATo clear old gradients so they don't accumulate
BTo compute the loss value
CTo update the model weights
DTo save the model
If a tensor has requires_grad=False, what happens during loss.backward()?
AGradient is computed normally
BThe tensor is updated automatically
CAn error occurs
DNo gradient is computed for that tensor
What is stored in the .grad attribute after loss.backward()?
AThe updated tensor value
BThe loss value
CThe gradient of the loss with respect to the tensor
DThe optimizer state
What does PyTorch use to track operations for gradient calculation?
AA dynamic computation graph
BA static computation graph
CA database
DA configuration file
Explain in your own words what happens during the backward pass when you call loss.backward() in PyTorch.
Think about how PyTorch figures out how to change weights to reduce loss.
You got /4 concepts.
    Why is it important to call optimizer.zero_grad() before loss.backward() in a training loop?
    Consider what happens if gradients from previous steps mix with current ones.
    You got /3 concepts.