Gradient clipping helps keep the training stable by stopping very large updates to the model. It prevents the model from making big jumps that can cause errors.
0
0
Gradient clipping in PyTorch
Introduction
When training deep neural networks that sometimes have very large gradients.
When the training loss suddenly becomes very large or unstable.
When using recurrent neural networks (RNNs) that can have exploding gradients.
When you want to keep training smooth and avoid the model weights from changing too much at once.
Syntax
PyTorch
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
# or
torch.nn.utils.clip_grad_value_(model.parameters(), clip_value)clip_grad_norm_ limits the total size (norm) of all gradients combined.
clip_grad_value_ limits each gradient value individually.
Examples
This clips the gradients so their total norm does not exceed 1.0.
PyTorch
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)This clips each gradient value to be between -0.5 and 0.5.
PyTorch
torch.nn.utils.clip_grad_value_(model.parameters(), clip_value=0.5)Sample Model
This example shows how to clip gradients to a maximum norm of 1.0 during training. It prints the gradient norm before and after clipping to see the effect.
PyTorch
import torch import torch.nn as nn import torch.optim as optim # Simple model class SimpleNet(nn.Module): def __init__(self): super().__init__() self.linear = nn.Linear(2, 1) def forward(self, x): return self.linear(x) # Create model, loss, optimizer model = SimpleNet() criterion = nn.MSELoss() optimizer = optim.SGD(model.parameters(), lr=0.1) # Dummy data inputs = torch.tensor([[10.0, 20.0], [30.0, 40.0]]) targets = torch.tensor([[1.0], [2.0]]) # Forward pass outputs = model(inputs) loss = criterion(outputs, targets) # Backward pass loss.backward() # Before clipping: print gradient norm total_norm = 0 for p in model.parameters(): if p.grad is not None: param_norm = p.grad.data.norm(2) total_norm += param_norm.item() ** 2 total_norm = total_norm ** 0.5 print(f"Gradient norm before clipping: {total_norm:.4f}") # Clip gradients torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # After clipping: print gradient norm total_norm = 0 for p in model.parameters(): if p.grad is not None: param_norm = p.grad.data.norm(2) total_norm += param_norm.item() ** 2 total_norm = total_norm ** 0.5 print(f"Gradient norm after clipping: {total_norm:.4f}") # Optimizer step optimizer.step()
OutputSuccess
Important Notes
Gradient clipping should be done after calling loss.backward() and before optimizer.step().
Clipping helps prevent the problem called 'exploding gradients' which can make training unstable.
Summary
Gradient clipping keeps training stable by limiting how big gradients can get.
Use clip_grad_norm_ to limit total gradient size or clip_grad_value_ to limit individual values.
Always clip gradients after backward pass and before optimizer step.