What is Gradient clipping in PyTorch?

PyTorchml~5 mins

Gradient clipping in PyTorch

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Gradient clipping helps keep the training stable by stopping very large updates to the model. It prevents the model from making big jumps that can cause errors.

When training deep neural networks that sometimes have very large gradients.

When the training loss suddenly becomes very large or unstable.

When using recurrent neural networks (RNNs) that can have exploding gradients.

When you want to keep training smooth and avoid the model weights from changing too much at once.

Syntax

PyTorch

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

# or

torch.nn.utils.clip_grad_value_(model.parameters(), clip_value)

clip_grad_norm_ limits the total size (norm) of all gradients combined.

clip_grad_value_ limits each gradient value individually.

Examples

This clips the gradients so their total norm does not exceed 1.0.

PyTorch

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

This clips each gradient value to be between -0.5 and 0.5.

PyTorch

torch.nn.utils.clip_grad_value_(model.parameters(), clip_value=0.5)

Sample Model

This example shows how to clip gradients to a maximum norm of 1.0 during training. It prints the gradient norm before and after clipping to see the effect.

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Simple model
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2, 1)

    def forward(self, x):
        return self.linear(x)

# Create model, loss, optimizer
model = SimpleNet()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Dummy data
inputs = torch.tensor([[10.0, 20.0], [30.0, 40.0]])
targets = torch.tensor([[1.0], [2.0]])

# Forward pass
outputs = model(inputs)
loss = criterion(outputs, targets)

# Backward pass
loss.backward()

# Before clipping: print gradient norm
total_norm = 0
for p in model.parameters():
    if p.grad is not None:
        param_norm = p.grad.data.norm(2)
        total_norm += param_norm.item() ** 2
total_norm = total_norm ** 0.5
print(f"Gradient norm before clipping: {total_norm:.4f}")

# Clip gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

# After clipping: print gradient norm
total_norm = 0
for p in model.parameters():
    if p.grad is not None:
        param_norm = p.grad.data.norm(2)
        total_norm += param_norm.item() ** 2
total_norm = total_norm ** 0.5
print(f"Gradient norm after clipping: {total_norm:.4f}")

# Optimizer step
optimizer.step()

OutputSuccess

Important Notes

Gradient clipping should be done after calling loss.backward() and before optimizer.step().

Clipping helps prevent the problem called 'exploding gradients' which can make training unstable.

Summary

Gradient clipping keeps training stable by limiting how big gradients can get.

Use clip_grad_norm_ to limit total gradient size or clip_grad_value_ to limit individual values.

Always clip gradients after backward pass and before optimizer step.

Practice

(1/5)

1. What is the main purpose of gradient clipping in PyTorch training?

easy

A. To prevent gradients from becoming too large and destabilizing training

B. To increase the learning rate automatically during training

C. To save memory by reducing model size

D. To initialize model weights before training

Gradient clipping in PyTorch

Start learning this pattern below

Practice

Solution

Step 1: Understand gradient behavior during training

Step 2: Role of gradient clipping

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch gradient clipping functions

Step 2: Identify function for norm clipping

Final Answer:

Quick Check:

Solution

Step 1: Understand code flow and gradient clipping

Step 2: Effect of clip_grad_norm_ on gradients

Final Answer:

Quick Check:

Solution

Step 1: Check order of operations for gradient clipping

Step 2: Identify mistake in code order

Final Answer:

Quick Check:

Solution

Step 1: Understand correct gradient clipping sequence

Step 2: Identify correct function and order

Final Answer:

Quick Check: