PyTorchml~20 mins

Zeroing gradients in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Zeroing gradients

Problem:You are training a simple neural network on a small dataset. The model's loss is not decreasing as expected during training.

Current Metrics:Epoch 1: loss=0.85, Epoch 2: loss=0.83, Epoch 3: loss=0.82 (minimal improvement)

Issue:The gradients are not being zeroed before each backward pass, causing gradient accumulation and incorrect weight updates.

Your Task

Fix the training loop to zero gradients properly so that the loss decreases steadily over epochs.

Do not change the model architecture.

Do not change the optimizer or learning rate.

Only modify the training loop to correctly zero gradients.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Simple dataset
X = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
y = torch.tensor([[2.0], [4.0], [6.0], [8.0]])

# Simple linear model
model = nn.Linear(1, 1)

# Mean squared error loss
criterion = nn.MSELoss()

# SGD optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(10):
    optimizer.zero_grad()  # Zero gradients before backward pass
    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}: loss={loss.item():.4f}")

Added optimizer.zero_grad() at the start of each training iteration to reset gradients.

Ensured zero_grad() is called before loss.backward() to prevent gradient accumulation.

Results Interpretation

Before Fix: Loss decreased very slowly (0.85 to 0.82 in 3 epochs) due to gradient accumulation causing incorrect updates.

After Fix: Loss decreased rapidly and steadily (4.5 to near 0 in 10 epochs) showing proper training progress.

Zeroing gradients before each backward pass is essential in PyTorch to avoid gradient accumulation and ensure correct model training.

Bonus Experiment

Try training the model without zeroing gradients but manually resetting gradients by setting each parameter's grad to zero.

💡 Hint

Use a loop over model.parameters() and set param.grad = None or torch.zeros_like(param.grad) before backward pass.