PyTorchml~20 mins

Replacing classifier head in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Replacing classifier head

Problem:You have a pretrained convolutional neural network for image classification. The model was trained on 1000 classes, but you want to use it for a new task with only 10 classes. The current classifier head outputs 1000 classes.

Current Metrics:Training accuracy: 95%, Validation accuracy: 40%

Issue:The model overfits the training data and performs poorly on validation because the classifier head is not adapted to the new 10-class problem.

Your Task

Replace the classifier head of the pretrained model to output 10 classes instead of 1000, then retrain the model to improve validation accuracy to at least 70%.

Do not change the pretrained feature extractor layers.

Only replace and train the classifier head.

Use PyTorch framework.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader

# Load pretrained model
model = models.resnet18(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace classifier head
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)  # 10 classes

# Only parameters of the new head will be trained
params_to_update = model.fc.parameters()

# Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(params_to_update, lr=0.001)

# Prepare data (example with CIFAR-10 for demonstration)
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

# Training loop for 5 epochs
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

for epoch in range(5):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    train_loss = running_loss / total
    train_acc = 100 * correct / total

    model.eval()
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()
    val_acc = 100 * val_correct / val_total

    print(f'Epoch {epoch+1}: Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%')

Replaced the original classifier head (fully connected layer) with a new one having 10 output features.

Froze all pretrained layers to keep their weights fixed during training.

Trained only the new classifier head with Adam optimizer and CrossEntropyLoss.

Used a smaller learning rate (0.001) for stable training of the new head.

Results Interpretation

Before replacing the classifier head, the model had a training accuracy of 95% but a low validation accuracy of 40%, indicating overfitting and mismatch of output classes.

After replacing the classifier head and training only it, training accuracy decreased to 85% but validation accuracy improved to 72%, showing better generalization to the new 10-class problem.

Replacing the classifier head to match the new task's number of classes and training only that part helps adapt a pretrained model to a new problem, reducing overfitting and improving validation performance.

Bonus Experiment

Try unfreezing some of the last pretrained layers along with the classifier head and fine-tune them together to see if validation accuracy improves further.

💡 Hint

Unfreeze the last few layers by setting requires_grad=True and use a smaller learning rate for these layers to avoid large weight updates.

Practice

(1/5)

1. What is the main reason to replace the classifier head in a pretrained PyTorch model?

easy

A. To adapt the model to a new task with different output classes

B. To speed up the training by removing layers

C. To reduce the model size by deleting layers

D. To change the input image size the model accepts

Replacing classifier head in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the classifier head role

Step 2: Reason about adapting to new tasks

Final Answer:

Quick Check:

Solution

Step 1: Identify ResNet classifier attribute

Step 2: Check input feature size for ResNet

Final Answer:

Quick Check:

Solution

Step 1: Understand the replaced classifier output size

Step 2: Check input batch size and output shape

Final Answer:

Quick Check:

Solution

Step 1: Check input feature size for classifier

Step 2: Identify mismatch causing runtime error

Final Answer:

Quick Check:

Solution

Step 1: Freeze all existing model parameters

Step 2: Replace classifier head with correct input/output sizes

Step 3: Ensure new head parameters are trainable

Final Answer:

Quick Check: