What is Learning rate differential in PyTorch?

PyTorchml~5 mins

Learning rate differential in PyTorch

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Learning rate differential means using different learning rates for different parts of a model. This helps the model learn better by adjusting how fast each part changes.

When fine-tuning a pre-trained model and you want to train new layers faster than old layers.

When different parts of the model learn at different speeds and need separate learning rates.

When combining a big model with a small new module and you want to control their training speeds.

When experimenting to improve training stability by slowing down some layers.

Syntax

PyTorch

optimizer = torch.optim.SGD([
    {'params': model.part1.parameters(), 'lr': 0.001},
    {'params': model.part2.parameters(), 'lr': 0.01}
], momentum=0.9)

You pass a list of dictionaries to the optimizer, each with its own learning rate.

Each dictionary must have a 'params' key with the parameters and a 'lr' key for learning rate.

Examples

Using Adam optimizer with a smaller learning rate for the base and a larger one for the head.

PyTorch

optimizer = torch.optim.Adam([
    {'params': model.base.parameters(), 'lr': 0.0001},
    {'params': model.head.parameters(), 'lr': 0.001}
])

Using SGD with momentum and different learning rates for two layers.

PyTorch

optimizer = torch.optim.SGD([
    {'params': model.layer1.parameters(), 'lr': 0.01},
    {'params': model.layer2.parameters(), 'lr': 0.001}
], momentum=0.9)

Sample Model

This code shows a simple model with two parts. We use different learning rates for each part in the optimizer. The training loop runs 3 times and prints the loss each time.

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Simple model with two parts
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.part1 = nn.Linear(10, 5)
        self.part2 = nn.Linear(5, 2)

    def forward(self, x):
        x = torch.relu(self.part1(x))
        x = self.part2(x)
        return x

model = SimpleModel()

# Create dummy data
inputs = torch.randn(8, 10)
targets = torch.randint(0, 2, (8,))

# Loss function
criterion = nn.CrossEntropyLoss()

# Optimizer with learning rate differential
optimizer = optim.SGD([
    {'params': model.part1.parameters(), 'lr': 0.001},
    {'params': model.part2.parameters(), 'lr': 0.01}
], momentum=0.9)

# Training loop for 3 epochs
for epoch in range(3):
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

OutputSuccess

Important Notes

Using different learning rates can help when some layers need slower or faster updates.

Make sure to pass the correct parameters to each learning rate group.

Learning rate differential is common in transfer learning and fine-tuning.

Summary

Learning rate differential means setting different learning rates for parts of a model.

This helps control how fast each part learns during training.

It is useful for fine-tuning and improving training results.

Practice

(1/5)

1. What does learning rate differential mean in PyTorch training?

easy

A. Changing the learning rate randomly during training

B. Setting different learning rates for different parts of a model

C. Using the same learning rate for the entire model

D. Freezing all model layers during training

Learning rate differential in PyTorch

Start learning this pattern below

Practice

Solution

Step 1: Understand learning rate concept

Step 2: Define learning rate differential

Final Answer:

Quick Check:

Solution

Step 1: Check PyTorch optimizer syntax for param groups

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Identify learning rates assigned to each layer

Step 2: Find learning rate for model.layer2

Final Answer:

Quick Check:

Solution

Step 1: Review param groups and learning rates

Step 2: Understand default lr behavior

Final Answer:

Quick Check:

Solution

Step 1: Understand freezing and learning rate

Step 2: Apply learning rate differential for fine-tuning

Final Answer:

Quick Check: