0
0
PyTorchml~20 mins

Gradient clipping in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Gradient clipping
Problem:Training a neural network on a classification task with PyTorch, but the model's training loss is unstable and sometimes explodes, causing poor convergence.
Current Metrics:Training loss fluctuates wildly and sometimes becomes NaN; validation accuracy is low around 60%.
Issue:The model suffers from exploding gradients, which cause unstable training and poor validation accuracy.
Your Task
Use gradient clipping to stabilize training and improve validation accuracy to above 75%.
Keep the model architecture and optimizer the same.
Only add gradient clipping before the optimizer step.
Hint 1
Hint 2
Hint 3
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Simple dataset
X = torch.randn(1000, 20)
y = (X.sum(dim=1) > 0).long()

train_ds = TensorDataset(X, y)
train_dl = DataLoader(train_ds, batch_size=32, shuffle=True)

# Simple model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(20, 50)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(50, 2)
    def forward(self, x):
        return self.fc2(self.relu(self.fc1(x)))

model = SimpleNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop with gradient clipping
for epoch in range(10):
    model.train()
    total_loss = 0
    for xb, yb in train_dl:
        optimizer.zero_grad()
        preds = model(xb)
        loss = criterion(preds, yb)
        loss.backward()
        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()
        total_loss += loss.item() * xb.size(0)
    avg_loss = total_loss / len(train_dl.dataset)
    print(f"Epoch {epoch+1}, Loss: {avg_loss:.4f}")
Added torch.nn.utils.clip_grad_norm_ call after loss.backward() to clip gradients by norm 1.0.
This prevents exploding gradients and stabilizes training.
Results Interpretation

Before Gradient Clipping: Training loss was unstable and sometimes NaN, validation accuracy ~60%.

After Gradient Clipping: Training loss decreases steadily, no NaNs, validation accuracy ~80%.

Gradient clipping helps prevent exploding gradients by limiting their size, which stabilizes training and improves model performance.
Bonus Experiment
Try different max_norm values for gradient clipping (e.g., 0.5, 2.0, 5.0) and observe how training stability and accuracy change.
💡 Hint
Smaller max_norm values clip gradients more aggressively, which may slow learning; larger values clip less and may allow instability.