When you replace a classifier head in a model, you want to check how well the new head predicts the correct classes. The key metrics are accuracy to see overall correctness, and precision and recall to understand how well it finds true positives without too many mistakes. This helps you know if the new head is learning properly and making good predictions.
Replacing classifier head in PyTorch - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Confusion Matrix (for 100 samples):
Predicted
Pos Neg
Actual
Pos 40 10
Neg 5 45
TP = 40, FP = 5, TN = 45, FN = 10
Precision = TP / (TP + FP) = 40 / (40 + 5) = 0.89
Recall = TP / (TP + FN) = 40 / (40 + 10) = 0.80
Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.84
If your new classifier head is used for spam detection, high precision is important. You want to avoid marking good emails as spam (false positives). So, fewer false alarms matter more.
If it is used for medical diagnosis, like cancer detection, high recall is critical. You want to catch as many true cases as possible, even if some false alarms happen.
Replacing the classifier head can change this balance. You must check which metric fits your goal and tune the head accordingly.
Good: Accuracy above 80%, precision and recall both above 75%, and balanced F1 score. This means the new head predicts well and finds most true cases without many mistakes.
Bad: Accuracy below 60%, or very low precision (e.g., 30%) or recall (e.g., 20%). This shows the new head is not learning well or is biased, missing many true cases or making many wrong predictions.
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, if 90% of data is one class, predicting that class always gives 90% accuracy but poor real performance.
- Data leakage: If test data leaks into training, metrics look too good but model fails in real use.
- Overfitting: New head may memorize training data but perform poorly on new data. Watch for big gaps between training and validation metrics.
- Ignoring class balance: Metrics like precision and recall per class matter more than overall accuracy when classes differ in size.
Your model with the replaced classifier head has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the model misses most positive cases (fraud). Even with high accuracy, it fails to find the important cases. For fraud detection, high recall is critical to catch frauds.
Practice
Solution
Step 1: Understand the classifier head role
The classifier head is the last layer that decides the output classes based on learned features.Step 2: Reason about adapting to new tasks
Replacing the classifier head allows the model to output predictions for new classes different from the original training.Final Answer:
To adapt the model to a new task with different output classes -> Option AQuick Check:
Classifier head replacement = new task adaptation [OK]
- Thinking replacing head changes input size
- Assuming it reduces model size significantly
- Believing it speeds up training by removing layers
Solution
Step 1: Identify ResNet classifier attribute
ResNet models usemodel.fcas the classifier head.Step 2: Check input feature size for ResNet
ResNet50 and similar have 2048 features before the classifier, so input size is 2048.Final Answer:
model.fc = nn.Linear(2048, 10) -> Option AQuick Check:
ResNet classifier = model.fc with 2048 input features [OK]
- Using wrong attribute like model.classifier or model.head
- Using wrong input size like 512 instead of 2048
- Confusing ResNet with other models like VGG
import torch import torch.nn as nn from torchvision import models model = models.resnet18(pretrained=True) model.fc = nn.Linear(512, 5) input_tensor = torch.randn(1, 3, 224, 224) output = model(input_tensor) print(output.shape)
Solution
Step 1: Understand the replaced classifier output size
The new classifier layer outputs 5 values per input (5 classes).Step 2: Check input batch size and output shape
Input batch size is 1, so output shape is (1, 5).Final Answer:
torch.Size([1, 5]) -> Option CQuick Check:
Output shape = (batch_size, output_classes) = (1, 5) [OK]
- Expecting original 1000 classes output
- Confusing feature size with output size
- Misreading input tensor shape as output
model.fc = nn.Linear(1024, 10) but got a runtime error during training. What is the most likely cause?Solution
Step 1: Check input feature size for classifier
The input size to the new Linear layer must match the output features of the previous layer.Step 2: Identify mismatch causing runtime error
If 1024 is incorrect, the model will raise size mismatch errors during forward pass.Final Answer:
The input feature size 1024 does not match the model's actual output features -> Option DQuick Check:
Input size mismatch causes runtime error [OK]
- Assuming output size causes error
- Confusing eval mode with training errors
- Thinking model.fc is missing in pretrained models
Solution
Step 1: Freeze all existing model parameters
Setparam.requires_grad = Falsefor all parameters to prevent updates during training.Step 2: Replace classifier head with correct input/output sizes
ResNet50's classifier input size is 2048; output size is 15 for new classes.Step 3: Ensure new head parameters are trainable
By replacingmodel.fcafter freezing, new layer parameters default torequires_grad=True.Final Answer:
Freeze all params, then replace head with nn.Linear(2048, 15) -> Option BQuick Check:
Freeze old layers, replace head with correct sizes [OK]
- Freezing after replacing head disables new layer training
- Using wrong input size 512 instead of 2048
- Not freezing any layers when fine-tuning
