PyTorchml~20 mins

Learning rate differential in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Learning rate differential

Problem:You are training a neural network on a classification task. The model uses two parts: a pretrained feature extractor and a new classifier layer. Currently, both parts use the same learning rate.

Current Metrics:Training accuracy: 95%, Validation accuracy: 78%, Training loss: 0.15, Validation loss: 0.45

Issue:The model overfits: training accuracy is high but validation accuracy is much lower. Using the same learning rate for both parts may cause the pretrained features to change too much or too little.

Your Task

Improve validation accuracy to above 85% while keeping training accuracy below 92% by using different learning rates for the pretrained feature extractor and the classifier.

You must keep the model architecture the same.

You can only change the learning rates for the two parts.

Use PyTorch optimizers and standard training loops.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader

# Prepare data
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])
train_dataset = datasets.FakeData(num_classes=10, transform=transform)
val_dataset = datasets.FakeData(num_classes=10, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

# Load pretrained model
model = models.resnet18(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)  # 10 classes

# Freeze all layers except the classifier for demonstration (optional)
# for param in model.parameters():
#     param.requires_grad = False
# for param in model.fc.parameters():
#     param.requires_grad = True

# Define loss and optimizer with differential learning rates
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD([
    {'params': model.conv1.parameters(), 'lr': 0.0001},
    {'params': model.bn1.parameters(), 'lr': 0.0001},
    {'params': model.layer1.parameters(), 'lr': 0.0001},
    {'params': model.layer2.parameters(), 'lr': 0.0001},
    {'params': model.layer3.parameters(), 'lr': 0.0001},
    {'params': model.layer4.parameters(), 'lr': 0.0001},
    {'params': model.fc.parameters(), 'lr': 0.01}
], momentum=0.9)

# Training loop
for epoch in range(5):
    model.train()
    total_correct = 0
    total_samples = 0
    total_loss = 0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * images.size(0)
        _, predicted = torch.max(outputs, 1)
        total_correct += (predicted == labels).sum().item()
        total_samples += labels.size(0)
    train_acc = total_correct / total_samples * 100
    train_loss = total_loss / total_samples

    model.eval()
    val_correct = 0
    val_samples = 0
    val_loss = 0
    with torch.no_grad():
        for images, labels in val_loader:
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs, 1)
            val_correct += (predicted == labels).sum().item()
            val_samples += labels.size(0)
    val_acc = val_correct / val_samples * 100
    val_loss = val_loss / val_samples

    print(f"Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Train Loss: {train_loss:.3f}, Val Acc: {val_acc:.2f}%, Val Loss: {val_loss:.3f}")

Set a smaller learning rate (0.0001) for pretrained layers to avoid large updates.

Set a higher learning rate (0.01) for the new classifier layer to learn quickly.

Used optimizer parameter groups to assign different learning rates.

Results Interpretation

Before: Training accuracy 95%, Validation accuracy 78%, Training loss 0.15, Validation loss 0.45

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.35

Using different learning rates for pretrained and new layers helps reduce overfitting by preserving useful features while allowing new layers to learn effectively.

Bonus Experiment

Try using an adaptive optimizer like Adam with learning rate differential and compare results.

💡 Hint

Replace SGD with Adam optimizer and keep different learning rates for pretrained and classifier layers.

Practice

(1/5)

1. What does learning rate differential mean in PyTorch training?

easy

A. Changing the learning rate randomly during training

B. Setting different learning rates for different parts of a model

C. Using the same learning rate for the entire model

D. Freezing all model layers during training

Learning rate differential in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand learning rate concept

Step 2: Define learning rate differential

Final Answer:

Quick Check:

Solution

Step 1: Check PyTorch optimizer syntax for param groups

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Identify learning rates assigned to each layer

Step 2: Find learning rate for model.layer2

Final Answer:

Quick Check:

Solution

Step 1: Review param groups and learning rates

Step 2: Understand default lr behavior

Final Answer:

Quick Check:

Solution

Step 1: Understand freezing and learning rate

Step 2: Apply learning rate differential for fine-tuning

Final Answer:

Quick Check: