What is Zeroing gradients in PyTorch?

PyTorchml~5 mins

Zeroing gradients in PyTorch

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Zeroing gradients clears old gradient values before calculating new ones. This helps the model learn correctly without mixing old and new information.

Before starting a new training step in a neural network.

When updating model weights during backpropagation.

To avoid accumulating gradients from multiple batches unintentionally.

When training with mini-batches to ensure fresh gradient calculations.

Syntax

PyTorch

optimizer.zero_grad()

This command sets all gradients stored in the optimizer to zero.

It is usually called before loss.backward() in the training loop.

Examples

Clears gradients before computing new ones in a training step.

PyTorch

optimizer.zero_grad()

Alternative way to clear gradients by setting them to None.

PyTorch

for param in model.parameters():
    param.grad = None

Sample Model

This code shows how zeroing gradients works in a simple training step. It clears old gradients, computes new ones, updates weights, and clears gradients again.

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Simple linear model
model = nn.Linear(2, 1)
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Dummy input and target
inputs = torch.tensor([[1.0, 2.0]])
target = torch.tensor([[1.0]])

# Forward pass
output = model(inputs)

# Compute mean squared error loss
loss_fn = nn.MSELoss()
loss = loss_fn(output, target)

# Zero gradients before backward pass
optimizer.zero_grad()

# Backward pass to compute gradients
loss.backward()

# Print gradients before optimizer step
grads_before = [param.grad.clone() for param in model.parameters()]

# Update weights
optimizer.step()

# Zero gradients again for next step
optimizer.zero_grad()

gradients_after = [param.grad for param in model.parameters()]

print("Gradients before optimizer step:")
for g in grads_before:
    print(g)

print("\nGradients after zero_grad call:")
for g in gradients_after:
    print(g)

OutputSuccess

Important Notes

Always zero gradients before calling loss.backward() to avoid mixing gradients from multiple steps.

Using param.grad = None can be more memory efficient than zeroing with zeros.

Forgetting to zero gradients can cause your model to learn incorrectly.

Summary

Zeroing gradients clears old gradient values before new calculations.

Use optimizer.zero_grad() before loss.backward() in training loops.

This ensures correct and fresh gradient updates for model learning.