Recall & Review
beginner
What is gradient accumulation in PyTorch?
Gradient accumulation is a technique where gradients are summed over multiple mini-batches before updating model weights. This helps simulate a larger batch size without increasing memory usage.
Click to reveal answer
beginner
Why do we need to zero gradients in PyTorch during training?
We zero gradients to clear old gradient values from the previous backward pass. Without zeroing, gradients would keep accumulating unintentionally, leading to incorrect updates.
Click to reveal answer
beginner
How do you zero gradients in PyTorch?
You call optimizer.zero_grad() before the backward pass to reset all gradients to zero.
Click to reveal answer
intermediate
What happens if you forget to zero gradients in a training loop?
Gradients from all previous batches accumulate, causing the model to update weights incorrectly and potentially harming training performance.
Click to reveal answer
intermediate
How does gradient accumulation help when GPU memory is limited?
By accumulating gradients over several small batches, you can simulate a larger batch size without needing to load all data at once, saving memory.
Click to reveal answer
What PyTorch function is used to clear gradients before a backward pass?
✗ Incorrect
optimizer.zero_grad() resets all gradients to zero before computing new gradients.
Why accumulate gradients over multiple batches?
✗ Incorrect
Accumulating gradients simulates a larger batch size without needing more memory.
What happens if you call optimizer.step() without zeroing gradients first?
✗ Incorrect
Gradients accumulate if not zeroed, so weights update with combined gradients.
Which of these is NOT a reason to use gradient accumulation?
✗ Incorrect
Gradient accumulation requires careful zeroing; it does not avoid zeroing gradients.
When should optimizer.zero_grad() be called in the training loop?
✗ Incorrect
Zeroing gradients before backward pass ensures gradients are fresh for current batch.
Explain how gradient accumulation works and why it is useful in training deep learning models.
Think about training with small batches but wanting the effect of a big batch.
You got /4 concepts.
Describe the importance of zeroing gradients in PyTorch and what could happen if you skip this step.
Consider what happens if gradients keep adding up every batch.
You got /4 concepts.