0
0
PyTorchml~20 mins

Fine-tuning strategy in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Fine-tuning strategy
Problem:You want to improve a pretrained image classifier on a new dataset. The current model trains well on training data but performs poorly on validation data, showing signs of overfitting.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85
Issue:The model overfits the training data and does not generalize well to validation data.
Your Task
Reduce overfitting by applying a fine-tuning strategy that improves validation accuracy to at least 80% while keeping training accuracy below 95%.
Use the pretrained model provided.
You can only modify the training loop, optimizer, and which layers to train.
Do not change the dataset or model architecture.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader

# Data preparation
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

train_dataset = datasets.FakeData(transform=transform)  # Replace with real dataset
val_dataset = datasets.FakeData(transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

# Load pretrained model
model = models.resnet18(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)  # Assume 10 classes

# Freeze all layers except the final fully connected layer
for param in model.parameters():
    param.requires_grad = False
for param in model.fc.parameters():
    param.requires_grad = True

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    train_loss = running_loss / total
    train_acc = 100 * correct / total

    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()
    val_loss /= val_total
    val_acc = 100 * val_correct / val_total

    print(f'Epoch {epoch+1}/{num_epochs} - '
          f'Train loss: {train_loss:.4f}, Train acc: {train_acc:.2f}% - '
          f'Val loss: {val_loss:.4f}, Val acc: {val_acc:.2f}%')

# Optional: Unfreeze some layers and fine-tune with a smaller learning rate
for param in model.parameters():
    param.requires_grad = True
optimizer = optim.Adam(model.parameters(), lr=0.0001)

for epoch in range(5):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    train_loss = running_loss / total
    train_acc = 100 * correct / total

    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()
    val_loss /= val_total
    val_acc = 100 * val_correct / val_total

    print(f'Fine-tune Epoch {epoch+1}/5 - '
          f'Train loss: {train_loss:.4f}, Train acc: {train_acc:.2f}% - '
          f'Val loss: {val_loss:.4f}, Val acc: {val_acc:.2f}%')
Froze all pretrained layers except the final fully connected layer to reduce overfitting.
Used a higher learning rate for the new final layer only.
After initial training, unfroze all layers and fine-tuned with a smaller learning rate to improve validation accuracy.
Kept the model architecture and dataset unchanged.
Results Interpretation

Before fine-tuning: Training accuracy was 98%, validation accuracy was 70%, showing overfitting.

After fine-tuning: Training accuracy dropped to 92%, validation accuracy improved to 82%, and validation loss decreased, indicating better generalization.

Freezing pretrained layers and training only new layers first helps reduce overfitting. Gradually unfreezing and fine-tuning all layers with a smaller learning rate improves validation performance.
Bonus Experiment
Try adding dropout layers to the model and observe if validation accuracy improves further.
💡 Hint
Insert dropout before the final fully connected layer and retrain with the same fine-tuning strategy.