Bird
Raised Fist0
PyTorchml~20 mins

Feature extraction strategy in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Feature extraction strategy
Problem:You want to classify images using a neural network. Currently, you train a small model from scratch on a limited dataset.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%
Issue:The model overfits the training data and performs poorly on new images because the dataset is small and the model learns too many details specific to training images.
Your Task
Reduce overfitting by using a feature extraction strategy with a pretrained model, aiming for validation accuracy above 85% while keeping training accuracy below 90%.
Use a pretrained model as a fixed feature extractor (do not fine-tune its weights).
Replace only the final classification layer.
Keep training epochs under 20.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader

# Data transforms
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# Load dataset (example: CIFAR10)
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

# Load pretrained model
pretrained_model = models.resnet18(pretrained=True)

# Freeze pretrained model parameters
for param in pretrained_model.parameters():
    param.requires_grad = False

# Replace final layer
num_features = pretrained_model.fc.in_features
pretrained_model.fc = nn.Linear(num_features, 10)  # CIFAR10 has 10 classes

# Move model to device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pretrained_model = pretrained_model.to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(pretrained_model.fc.parameters(), lr=0.001)

# Training loop
num_epochs = 15
for epoch in range(num_epochs):
    pretrained_model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = pretrained_model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    train_loss = running_loss / total
    train_acc = 100. * correct / total

    # Validation
    pretrained_model.eval()
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = pretrained_model(inputs)
            _, predicted = outputs.max(1)
            val_total += labels.size(0)
            val_correct += predicted.eq(labels).sum().item()
    val_acc = 100. * val_correct / val_total

    print(f'Epoch {epoch+1}/{num_epochs} - Train Loss: {train_loss:.4f} - Train Acc: {train_acc:.2f}% - Val Acc: {val_acc:.2f}%')
Used a pretrained ResNet18 model as a fixed feature extractor.
Froze all pretrained model parameters to prevent training.
Replaced the final fully connected layer to match the number of classes.
Trained only the new final layer with a small learning rate.
Limited training to 15 epochs to avoid overfitting.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70% (overfitting)

After: Training accuracy 88%, Validation accuracy 87% (better generalization)

Using a pretrained model as a fixed feature extractor helps reduce overfitting on small datasets by leveraging learned features from large datasets. Training only the final layer improves validation accuracy while keeping training accuracy moderate.
Bonus Experiment
Try fine-tuning the pretrained model by unfreezing some of its layers and training with a lower learning rate.
💡 Hint
Unfreeze the last few layers of the pretrained model and use a smaller learning rate (e.g., 1e-4) to adjust pretrained weights gently.

Practice

(1/5)
1. What is the main purpose of using a pre-trained model for feature extraction in PyTorch?
easy
A. To replace the optimizer with a new one
B. To use learned features from a large dataset and avoid training from scratch
C. To train all layers from random weights
D. To increase the size of the dataset automatically

Solution

  1. Step 1: Understand feature extraction concept

    Feature extraction uses a model already trained on a large dataset to get useful features without training all layers again.
  2. Step 2: Identify the main benefit

    This saves time and resources by reusing learned knowledge instead of starting from scratch.
  3. Final Answer:

    To use learned features from a large dataset and avoid training from scratch -> Option B
  4. Quick Check:

    Feature extraction = reuse learned features [OK]
Hint: Pre-trained means reuse, not retrain all layers [OK]
Common Mistakes:
  • Thinking feature extraction means training all layers
  • Confusing feature extraction with data augmentation
  • Believing optimizer changes are part of feature extraction
2. Which PyTorch code snippet correctly freezes all layers of a pre-trained model except the final layer?
easy
A. for param in model.parameters(): param.requires_grad = True model.fc = nn.Linear(512, 10)
B. model.fc.requires_grad = False for param in model.parameters(): param.requires_grad = True
C. for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(512, 10)
D. model.fc = nn.Linear(512, 10) for param in model.parameters(): param.requires_grad = False

Solution

  1. Step 1: Freeze all layers by setting requires_grad to false

    The loop disables gradient updates for all parameters to keep pre-trained weights fixed.
  2. Step 2: Replace the final layer with a new one to train

    Assigning a new linear layer to model.fc allows training only this layer for the new task.
  3. Final Answer:

    for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(512, 10) -> Option C
  4. Quick Check:

    Freeze all except final layer = for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(512, 10) [OK]
Hint: Freeze first, then replace final layer [OK]
Common Mistakes:
  • Not freezing layers before replacing final layer
  • Freezing final layer instead of others
  • Setting requires_grad true for all parameters
3. Given this PyTorch code for feature extraction, what will be the output shape of features?
import torch
import torchvision.models as models
model = models.resnet18(pretrained=True)
model.fc = torch.nn.Identity()
input_tensor = torch.randn(4, 3, 224, 224)
features = model(input_tensor)
print(features.shape)
medium
A. torch.Size([4, 512])
B. torch.Size([4, 1000])
C. torch.Size([4, 3, 224, 224])
D. torch.Size([4, 2048])

Solution

  1. Step 1: Understand model modification

    Replacing model.fc with Identity removes the final classification layer, so output is the feature vector before classification.
  2. Step 2: Know ResNet18 feature size

    ResNet18 outputs a 512-dimensional vector before the final fc layer for each input image.
  3. Final Answer:

    torch.Size([4, 512]) -> Option A
  4. Quick Check:

    ResNet18 features = 512 dims [OK]
Hint: Identity layer outputs feature vector size [OK]
Common Mistakes:
  • Assuming output is 1000 classes without removing fc
  • Confusing batch size with feature dimension
  • Expecting 2048 features from ResNet18 (it's 512)
4. Identify the error in this feature extraction code snippet and select the fix:
model = models.resnet50(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
model.fc = nn.Linear(2048, 5)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop here
medium
A. No error; code is correct
B. Set requires_grad=True for model.fc parameters after replacement
C. Use Adam optimizer instead of SGD
D. Remove freezing of parameters to train all layers

Solution

  1. Step 1: Check freezing timing

    The loop freezes existing parameters before replacing model.fc, so the new fc layer's parameters are created with requires_grad=True by default.
  2. Step 2: Verify optimizer behavior

    Optimizer only updates parameters where requires_grad=True, which are the new fc parameters; backbone remains frozen.
  3. Final Answer:

    No error; code is correct -> Option A
  4. Quick Check:

    New layer params unfrozen by default [OK]
Hint: New layers have requires_grad=True by default [OK]
Common Mistakes:
  • Assuming freezing all parameters includes new layers
  • Changing optimizer without fixing requires_grad
  • Removing freezing unnecessarily
5. You want to use a pre-trained ResNet34 to classify 3 classes in your dataset. You freeze all layers except the last one. However, your training accuracy stays very low. What is the best next step to improve feature extraction performance?
hard
A. Reduce batch size to 1 to improve gradient estimates
B. Increase learning rate to 1.0 for faster training
C. Replace the optimizer with SGD without momentum
D. Unfreeze some deeper layers to fine-tune features for your task

Solution

  1. Step 1: Understand freezing impact

    Freezing all but last layer may limit model's ability to adapt features to new classes, causing low accuracy.
  2. Step 2: Fine-tune some deeper layers

    Unfreezing some layers closer to output allows the model to adjust features better for your specific dataset.
  3. Final Answer:

    Unfreeze some deeper layers to fine-tune features for your task -> Option D
  4. Quick Check:

    Fine-tune layers = better adaptation [OK]
Hint: Fine-tune layers if frozen model underperforms [OK]
Common Mistakes:
  • Increasing learning rate too much causes instability
  • Changing optimizer without addressing feature adaptation
  • Reducing batch size unnecessarily