Bird
Raised Fist0
PyTorchml~20 mins

Fine-tuning strategy in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Fine-tuning strategy
Problem:You want to improve a pretrained image classifier on a new dataset. The current model trains well on training data but performs poorly on validation data, showing signs of overfitting.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85
Issue:The model overfits the training data and does not generalize well to validation data.
Your Task
Reduce overfitting by applying a fine-tuning strategy that improves validation accuracy to at least 80% while keeping training accuracy below 95%.
Use the pretrained model provided.
You can only modify the training loop, optimizer, and which layers to train.
Do not change the dataset or model architecture.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader

# Data preparation
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

train_dataset = datasets.FakeData(transform=transform)  # Replace with real dataset
val_dataset = datasets.FakeData(transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

# Load pretrained model
model = models.resnet18(pretrained=True)
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)  # Assume 10 classes

# Freeze all layers except the final fully connected layer
for param in model.parameters():
    param.requires_grad = False
for param in model.fc.parameters():
    param.requires_grad = True

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    train_loss = running_loss / total
    train_acc = 100 * correct / total

    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()
    val_loss /= val_total
    val_acc = 100 * val_correct / val_total

    print(f'Epoch {epoch+1}/{num_epochs} - '
          f'Train loss: {train_loss:.4f}, Train acc: {train_acc:.2f}% - '
          f'Val loss: {val_loss:.4f}, Val acc: {val_acc:.2f}%')

# Optional: Unfreeze some layers and fine-tune with a smaller learning rate
for param in model.parameters():
    param.requires_grad = True
optimizer = optim.Adam(model.parameters(), lr=0.0001)

for epoch in range(5):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    train_loss = running_loss / total
    train_acc = 100 * correct / total

    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()
    val_loss /= val_total
    val_acc = 100 * val_correct / val_total

    print(f'Fine-tune Epoch {epoch+1}/5 - '
          f'Train loss: {train_loss:.4f}, Train acc: {train_acc:.2f}% - '
          f'Val loss: {val_loss:.4f}, Val acc: {val_acc:.2f}%')
Froze all pretrained layers except the final fully connected layer to reduce overfitting.
Used a higher learning rate for the new final layer only.
After initial training, unfroze all layers and fine-tuned with a smaller learning rate to improve validation accuracy.
Kept the model architecture and dataset unchanged.
Results Interpretation

Before fine-tuning: Training accuracy was 98%, validation accuracy was 70%, showing overfitting.

After fine-tuning: Training accuracy dropped to 92%, validation accuracy improved to 82%, and validation loss decreased, indicating better generalization.

Freezing pretrained layers and training only new layers first helps reduce overfitting. Gradually unfreezing and fine-tuning all layers with a smaller learning rate improves validation performance.
Bonus Experiment
Try adding dropout layers to the model and observe if validation accuracy improves further.
💡 Hint
Insert dropout before the final fully connected layer and retrain with the same fine-tuning strategy.

Practice

(1/5)
1. What is the main purpose of fine-tuning a pre-trained PyTorch model?
easy
A. To adjust the model to perform well on a new task by training some layers
B. To train the model from scratch on a large dataset
C. To reduce the model size by removing layers
D. To convert the model to a different programming language

Solution

  1. Step 1: Understand fine-tuning concept

    Fine-tuning means taking a model already trained on one task and adjusting it to work well on a new task by training some of its layers.
  2. Step 2: Compare options

    Only To adjust the model to perform well on a new task by training some layers describes this process correctly. Other options describe unrelated actions.
  3. Final Answer:

    To adjust the model to perform well on a new task by training some layers -> Option A
  4. Quick Check:

    Fine-tuning = Adjust model layers for new task [OK]
Hint: Fine-tuning means training some layers for a new task [OK]
Common Mistakes:
  • Thinking fine-tuning means training from scratch
  • Confusing fine-tuning with model compression
  • Assuming fine-tuning changes the whole model
2. Which PyTorch code snippet correctly freezes all layers except the last one for fine-tuning?
easy
A. model.freeze_all_layers() model.unfreeze_last_layer()
B. for param in model.parameters(): param.requires_grad = True for param in model.fc.parameters(): param.requires_grad = False
C. model.requires_grad = False model.fc.requires_grad = True
D. for param in model.parameters(): param.requires_grad = False for param in model.fc.parameters(): param.requires_grad = True

Solution

  1. Step 1: Understand freezing layers in PyTorch

    Setting param.requires_grad = False freezes a layer so it won't update during training.
  2. Step 2: Analyze code snippets

    for param in model.parameters(): param.requires_grad = False for param in model.fc.parameters(): param.requires_grad = True freezes all parameters first, then unfreezes only the last layer (model.fc). The other options reverse or misuse this logic or use non-existent methods.
  3. Final Answer:

    for param in model.parameters(): param.requires_grad = False for param in model.fc.parameters(): param.requires_grad = True -> Option D
  4. Quick Check:

    Freeze all, unfreeze last layer = for param in model.parameters(): param.requires_grad = False for param in model.fc.parameters(): param.requires_grad = True [OK]
Hint: Freeze all with requires_grad=False, then unfreeze last layer [OK]
Common Mistakes:
  • Setting requires_grad True for all layers by mistake
  • Using non-existent PyTorch methods
  • Forgetting to unfreeze the last layer
3. Given this PyTorch code for fine-tuning, what will be the output of print(sum(p.requires_grad for p in model.parameters()))?
for param in model.parameters():
    param.requires_grad = False
for param in model.classifier.parameters():
    param.requires_grad = True
print(sum(p.requires_grad for p in model.parameters()))
medium
A. Number of all model parameters
B. Number of parameters in model.classifier
C. Zero
D. Raises an error

Solution

  1. Step 1: Understand requires_grad flags

    All parameters are first frozen (requires_grad=False). Then only parameters in model.classifier are unfrozen (requires_grad=True).
  2. Step 2: Calculate sum of requires_grad

    Summing p.requires_grad counts how many parameters are trainable. Since only model.classifier parameters are True, the sum equals their count.
  3. Final Answer:

    Number of parameters in model.classifier -> Option B
  4. Quick Check:

    Only classifier params require grad = Number of parameters in model.classifier [OK]
Hint: Sum requires_grad counts trainable parameters [OK]
Common Mistakes:
  • Assuming all parameters are trainable
  • Confusing boolean sum with total parameters
  • Expecting an error from this code
4. You tried to fine-tune a model by freezing layers but the training loss does not change. What is the most likely error in your PyTorch code?
medium
A. You used the wrong optimizer
B. You forgot to set model.train() before training
C. You did not set requires_grad = True for any parameters
D. You replaced the last layer with wrong output size

Solution

  1. Step 1: Analyze symptom - loss not changing

    If loss stays the same, model parameters are not updating during training.
  2. Step 2: Check requires_grad flags

    If all parameters have requires_grad = False, gradients won't be computed and weights won't update, causing no loss change.
  3. Final Answer:

    You did not set requires_grad = True for any parameters -> Option C
  4. Quick Check:

    No trainable params = no loss change [OK]
Hint: Check requires_grad True for trainable layers [OK]
Common Mistakes:
  • Assuming optimizer choice causes no loss change
  • Forgetting to call model.train() but blaming loss
  • Ignoring requires_grad flags
5. You want to fine-tune a pre-trained ResNet model on a 10-class problem. Which strategy is best to start with?
hard
A. Freeze all layers, replace the final fully connected layer with 10 outputs, and train only this layer
B. Train the entire ResNet model from scratch with 10 output classes
C. Freeze only the first convolutional layer and train the rest
D. Replace the final layer but keep all layers trainable without freezing

Solution

  1. Step 1: Understand common fine-tuning approach

    Starting by freezing all layers except the last layer is a common strategy to adapt a pre-trained model to a new task efficiently.
  2. Step 2: Evaluate options

    Freeze all layers, replace the final fully connected layer with 10 outputs, and train only this layer matches this approach: freeze all, replace last layer for 10 classes, train only last layer. Other options either train from scratch or do not freeze enough layers, which can be inefficient or unstable.
  3. Final Answer:

    Freeze all layers, replace the final fully connected layer with 10 outputs, and train only this layer -> Option A
  4. Quick Check:

    Freeze all but last layer for new task [OK]
Hint: Freeze all, replace last layer, train only it first [OK]
Common Mistakes:
  • Training entire model from scratch unnecessarily
  • Freezing too few layers causing slow training
  • Not replacing last layer to match output classes