PyTorchml~3 mins

Why Zeroing gradients in PyTorch? - Purpose & Use Cases

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

The Big Idea

What if your model kept mixing old mistakes with new ones, never really learning properly?

The Scenario

Imagine you are trying to teach a robot to learn from its mistakes by adjusting its actions little by little. Each time it learns, it remembers the changes it made before. But if it doesn't clear those old changes, it mixes old and new lessons, causing confusion.

The Problem

Without clearing old adjustments (gradients), the robot adds up all past changes, making learning slow and messy. This leads to wrong updates and poor performance because the robot can't tell which lesson is new or old.

The Solution

Zeroing gradients resets the robot's memory of past changes before each new lesson. This way, it only learns from the current mistake, making updates clear and effective.

Before vs After

✗ Before

optimizer.step()
# forgot to zero gradients

✓ After

optimizer.zero_grad()
loss.backward()
optimizer.step()

What It Enables

It enables clear, fresh learning steps so the model improves correctly every time.

Real Life Example

When training a model to recognize cats in photos, zeroing gradients ensures each photo teaches the model fresh information without mixing old errors.

Key Takeaways

Gradients store how to improve the model.

Not zeroing gradients mixes old and new info, causing errors.

Zeroing gradients clears old info for clean learning steps.