What if your model kept mixing old mistakes with new ones, never really learning properly?
Why Zeroing gradients in PyTorch? - Purpose & Use Cases
Imagine you are trying to teach a robot to learn from its mistakes by adjusting its actions little by little. Each time it learns, it remembers the changes it made before. But if it doesn't clear those old changes, it mixes old and new lessons, causing confusion.
Without clearing old adjustments (gradients), the robot adds up all past changes, making learning slow and messy. This leads to wrong updates and poor performance because the robot can't tell which lesson is new or old.
Zeroing gradients resets the robot's memory of past changes before each new lesson. This way, it only learns from the current mistake, making updates clear and effective.
optimizer.step()
# forgot to zero gradientsoptimizer.zero_grad() loss.backward() optimizer.step()
It enables clear, fresh learning steps so the model improves correctly every time.
When training a model to recognize cats in photos, zeroing gradients ensures each photo teaches the model fresh information without mixing old errors.
Gradients store how to improve the model.
Not zeroing gradients mixes old and new info, causing errors.
Zeroing gradients clears old info for clean learning steps.