Learning rate differential means using different learning rates for different parts of a model. This helps the model learn better by adjusting how fast each part changes.
Learning rate differential in PyTorch
optimizer = torch.optim.SGD([
{'params': model.part1.parameters(), 'lr': 0.001},
{'params': model.part2.parameters(), 'lr': 0.01}
], momentum=0.9)You pass a list of dictionaries to the optimizer, each with its own learning rate.
Each dictionary must have a 'params' key with the parameters and a 'lr' key for learning rate.
optimizer = torch.optim.Adam([
{'params': model.base.parameters(), 'lr': 0.0001},
{'params': model.head.parameters(), 'lr': 0.001}
])optimizer = torch.optim.SGD([
{'params': model.layer1.parameters(), 'lr': 0.01},
{'params': model.layer2.parameters(), 'lr': 0.001}
], momentum=0.9)This code shows a simple model with two parts. We use different learning rates for each part in the optimizer. The training loop runs 3 times and prints the loss each time.
import torch import torch.nn as nn import torch.optim as optim # Simple model with two parts class SimpleModel(nn.Module): def __init__(self): super().__init__() self.part1 = nn.Linear(10, 5) self.part2 = nn.Linear(5, 2) def forward(self, x): x = torch.relu(self.part1(x)) x = self.part2(x) return x model = SimpleModel() # Create dummy data inputs = torch.randn(8, 10) targets = torch.randint(0, 2, (8,)) # Loss function criterion = nn.CrossEntropyLoss() # Optimizer with learning rate differential optimizer = optim.SGD([ {'params': model.part1.parameters(), 'lr': 0.001}, {'params': model.part2.parameters(), 'lr': 0.01} ], momentum=0.9) # Training loop for 3 epochs for epoch in range(3): optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
Using different learning rates can help when some layers need slower or faster updates.
Make sure to pass the correct parameters to each learning rate group.
Learning rate differential is common in transfer learning and fine-tuning.
Learning rate differential means setting different learning rates for parts of a model.
This helps control how fast each part learns during training.
It is useful for fine-tuning and improving training results.