0
0
PyTorchml~5 mins

Zeroing gradients in PyTorch

Choose your learning style9 modes available
Introduction
Zeroing gradients clears old gradient values before calculating new ones. This helps the model learn correctly without mixing old and new information.
Before starting a new training step in a neural network.
When updating model weights during backpropagation.
To avoid accumulating gradients from multiple batches unintentionally.
When training with mini-batches to ensure fresh gradient calculations.
Syntax
PyTorch
optimizer.zero_grad()
This command sets all gradients stored in the optimizer to zero.
It is usually called before loss.backward() in the training loop.
Examples
Clears gradients before computing new ones in a training step.
PyTorch
optimizer.zero_grad()
Alternative way to clear gradients by setting them to None.
PyTorch
for param in model.parameters():
    param.grad = None
Sample Model
This code shows how zeroing gradients works in a simple training step. It clears old gradients, computes new ones, updates weights, and clears gradients again.
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim

# Simple linear model
model = nn.Linear(2, 1)
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Dummy input and target
inputs = torch.tensor([[1.0, 2.0]])
target = torch.tensor([[1.0]])

# Forward pass
output = model(inputs)

# Compute mean squared error loss
loss_fn = nn.MSELoss()
loss = loss_fn(output, target)

# Zero gradients before backward pass
optimizer.zero_grad()

# Backward pass to compute gradients
loss.backward()

# Print gradients before optimizer step
grads_before = [param.grad.clone() for param in model.parameters()]

# Update weights
optimizer.step()

# Zero gradients again for next step
optimizer.zero_grad()

gradients_after = [param.grad for param in model.parameters()]

print("Gradients before optimizer step:")
for g in grads_before:
    print(g)

print("\nGradients after zero_grad call:")
for g in gradients_after:
    print(g)
OutputSuccess
Important Notes
Always zero gradients before calling loss.backward() to avoid mixing gradients from multiple steps.
Using param.grad = None can be more memory efficient than zeroing with zeros.
Forgetting to zero gradients can cause your model to learn incorrectly.
Summary
Zeroing gradients clears old gradient values before new calculations.
Use optimizer.zero_grad() before loss.backward() in training loops.
This ensures correct and fresh gradient updates for model learning.