Bird
Raised Fist0
PyTorchml~20 mins

Replacing classifier head in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Replacing classifier head
Problem:You have a pretrained convolutional neural network for image classification. The model was trained on 1000 classes, but you want to use it for a new task with only 10 classes. The current classifier head outputs 1000 classes.
Current Metrics:Training accuracy: 95%, Validation accuracy: 40%
Issue:The model overfits the training data and performs poorly on validation because the classifier head is not adapted to the new 10-class problem.
Your Task
Replace the classifier head of the pretrained model to output 10 classes instead of 1000, then retrain the model to improve validation accuracy to at least 70%.
Do not change the pretrained feature extractor layers.
Only replace and train the classifier head.
Use PyTorch framework.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader

# Load pretrained model
model = models.resnet18(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace classifier head
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)  # 10 classes

# Only parameters of the new head will be trained
params_to_update = model.fc.parameters()

# Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(params_to_update, lr=0.001)

# Prepare data (example with CIFAR-10 for demonstration)
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

# Training loop for 5 epochs
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

for epoch in range(5):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    train_loss = running_loss / total
    train_acc = 100 * correct / total

    model.eval()
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()
    val_acc = 100 * val_correct / val_total

    print(f'Epoch {epoch+1}: Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%')
Replaced the original classifier head (fully connected layer) with a new one having 10 output features.
Froze all pretrained layers to keep their weights fixed during training.
Trained only the new classifier head with Adam optimizer and CrossEntropyLoss.
Used a smaller learning rate (0.001) for stable training of the new head.
Results Interpretation

Before replacing the classifier head, the model had a training accuracy of 95% but a low validation accuracy of 40%, indicating overfitting and mismatch of output classes.

After replacing the classifier head and training only it, training accuracy decreased to 85% but validation accuracy improved to 72%, showing better generalization to the new 10-class problem.

Replacing the classifier head to match the new task's number of classes and training only that part helps adapt a pretrained model to a new problem, reducing overfitting and improving validation performance.
Bonus Experiment
Try unfreezing some of the last pretrained layers along with the classifier head and fine-tune them together to see if validation accuracy improves further.
💡 Hint
Unfreeze the last few layers by setting requires_grad=True and use a smaller learning rate for these layers to avoid large weight updates.

Practice

(1/5)
1. What is the main reason to replace the classifier head in a pretrained PyTorch model?
easy
A. To adapt the model to a new task with different output classes
B. To speed up the training by removing layers
C. To reduce the model size by deleting layers
D. To change the input image size the model accepts

Solution

  1. Step 1: Understand the classifier head role

    The classifier head is the last layer that decides the output classes based on learned features.
  2. Step 2: Reason about adapting to new tasks

    Replacing the classifier head allows the model to output predictions for new classes different from the original training.
  3. Final Answer:

    To adapt the model to a new task with different output classes -> Option A
  4. Quick Check:

    Classifier head replacement = new task adaptation [OK]
Hint: Classifier head controls output classes, replace for new tasks [OK]
Common Mistakes:
  • Thinking replacing head changes input size
  • Assuming it reduces model size significantly
  • Believing it speeds up training by removing layers
2. Which of the following is the correct way to replace the classifier head of a pretrained ResNet model in PyTorch for 10 output classes?
easy
A. model.fc = nn.Linear(2048, 10)
B. model.classifier = nn.Linear(2048, 10)
C. model.fc = nn.Linear(512, 10)
D. model.head = nn.Linear(512, 10)

Solution

  1. Step 1: Identify ResNet classifier attribute

    ResNet models use model.fc as the classifier head.
  2. Step 2: Check input feature size for ResNet

    ResNet50 and similar have 2048 features before the classifier, so input size is 2048.
  3. Final Answer:

    model.fc = nn.Linear(2048, 10) -> Option A
  4. Quick Check:

    ResNet classifier = model.fc with 2048 input features [OK]
Hint: ResNet classifier is model.fc with 2048 input features [OK]
Common Mistakes:
  • Using wrong attribute like model.classifier or model.head
  • Using wrong input size like 512 instead of 2048
  • Confusing ResNet with other models like VGG
3. Given the code below, what will be the output shape of the model's final layer after replacement?
import torch
import torch.nn as nn
from torchvision import models

model = models.resnet18(pretrained=True)
model.fc = nn.Linear(512, 5)

input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)
print(output.shape)
medium
A. torch.Size([1, 1000])
B. torch.Size([1, 512])
C. torch.Size([1, 5])
D. torch.Size([3, 224, 224])

Solution

  1. Step 1: Understand the replaced classifier output size

    The new classifier layer outputs 5 values per input (5 classes).
  2. Step 2: Check input batch size and output shape

    Input batch size is 1, so output shape is (1, 5).
  3. Final Answer:

    torch.Size([1, 5]) -> Option C
  4. Quick Check:

    Output shape = (batch_size, output_classes) = (1, 5) [OK]
Hint: Output shape matches batch size and new class count [OK]
Common Mistakes:
  • Expecting original 1000 classes output
  • Confusing feature size with output size
  • Misreading input tensor shape as output
4. You tried replacing the classifier head of a pretrained model with model.fc = nn.Linear(1024, 10) but got a runtime error during training. What is the most likely cause?
medium
A. The model.fc attribute does not exist in pretrained models
B. The output size 10 is too large for the model
C. You forgot to call model.eval() before training
D. The input feature size 1024 does not match the model's actual output features

Solution

  1. Step 1: Check input feature size for classifier

    The input size to the new Linear layer must match the output features of the previous layer.
  2. Step 2: Identify mismatch causing runtime error

    If 1024 is incorrect, the model will raise size mismatch errors during forward pass.
  3. Final Answer:

    The input feature size 1024 does not match the model's actual output features -> Option D
  4. Quick Check:

    Input size mismatch causes runtime error [OK]
Hint: Match Linear input size to previous layer output features [OK]
Common Mistakes:
  • Assuming output size causes error
  • Confusing eval mode with training errors
  • Thinking model.fc is missing in pretrained models
5. You want to fine-tune a pretrained ResNet50 on a dataset with 15 classes. Which code snippet correctly replaces the classifier head and freezes all layers except the new head?
hard
A. model = models.resnet50(pretrained=True) model.fc = nn.Linear(2048, 15) for param in model.parameters(): param.requires_grad = False
B. model = models.resnet50(pretrained=True) for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(2048, 15)
C. model = models.resnet50(pretrained=True) for param in model.fc.parameters(): param.requires_grad = False model.fc = nn.Linear(2048, 15)
D. model = models.resnet50(pretrained=True) model.fc = nn.Linear(512, 15) for param in model.parameters(): param.requires_grad = True

Solution

  1. Step 1: Freeze all existing model parameters

    Set param.requires_grad = False for all parameters to prevent updates during training.
  2. Step 2: Replace classifier head with correct input/output sizes

    ResNet50's classifier input size is 2048; output size is 15 for new classes.
  3. Step 3: Ensure new head parameters are trainable

    By replacing model.fc after freezing, new layer parameters default to requires_grad=True.
  4. Final Answer:

    Freeze all params, then replace head with nn.Linear(2048, 15) -> Option B
  5. Quick Check:

    Freeze old layers, replace head with correct sizes [OK]
Hint: Freeze before replacing head to keep new layer trainable [OK]
Common Mistakes:
  • Freezing after replacing head disables new layer training
  • Using wrong input size 512 instead of 2048
  • Not freezing any layers when fine-tuning