Bird
Raised Fist0
PyTorchml~5 mins

Feature extraction strategy in PyTorch - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a feature extraction strategy in machine learning?
It is a method to transform raw data into useful information (features) that a model can learn from more easily.
Click to reveal answer
intermediate
Why do we freeze layers in a pretrained model during feature extraction?
Freezing layers means we do not update their weights during training. This keeps the learned features intact and reduces training time.
Click to reveal answer
beginner
In PyTorch, how do you freeze the parameters of a pretrained model?
You set each parameter's requires_grad attribute to False, like:
for param in model.parameters():
  param.requires_grad = False
Click to reveal answer
intermediate
What is the difference between feature extraction and fine-tuning?
Feature extraction uses pretrained model layers as fixed feature detectors. Fine-tuning updates some or all pretrained layers to better fit new data.
Click to reveal answer
intermediate
How can you replace the final layer of a pretrained model in PyTorch for feature extraction?
You assign a new layer to the model's classifier or fc attribute, for example:
model.fc = nn.Linear(in_features, num_classes)
Click to reveal answer
What does freezing layers in a pretrained model do?
ADeletes the layers from the model
BPrevents their weights from updating during training
CAdds new layers to the model
DIncreases the learning rate for those layers
Which PyTorch attribute controls if a parameter is trainable?
Arequires_grad
Btrainable
Cgrad_enabled
Dupdate_weights
What is the main goal of feature extraction?
ATo increase the size of the dataset
BTo train a model from scratch
CTo use learned features from a pretrained model to help a new task
DTo remove irrelevant data points
When replacing the final layer in a pretrained model, what must you adjust?
AThe optimizer type
BThe input image size
CThe learning rate only
DThe number of output features to match your task classes
Which of these is NOT a benefit of feature extraction?
AAlways achieves perfect accuracy
BRequires less data
CFaster training time
DUses pretrained knowledge
Explain how to perform feature extraction using a pretrained model in PyTorch.
Think about which parts of the model you keep fixed and which parts you change.
You got /4 concepts.
    Describe the difference between feature extraction and fine-tuning in transfer learning.
    Consider how much of the pretrained model you allow to learn.
    You got /4 concepts.

      Practice

      (1/5)
      1. What is the main purpose of using a pre-trained model for feature extraction in PyTorch?
      easy
      A. To replace the optimizer with a new one
      B. To use learned features from a large dataset and avoid training from scratch
      C. To train all layers from random weights
      D. To increase the size of the dataset automatically

      Solution

      1. Step 1: Understand feature extraction concept

        Feature extraction uses a model already trained on a large dataset to get useful features without training all layers again.
      2. Step 2: Identify the main benefit

        This saves time and resources by reusing learned knowledge instead of starting from scratch.
      3. Final Answer:

        To use learned features from a large dataset and avoid training from scratch -> Option B
      4. Quick Check:

        Feature extraction = reuse learned features [OK]
      Hint: Pre-trained means reuse, not retrain all layers [OK]
      Common Mistakes:
      • Thinking feature extraction means training all layers
      • Confusing feature extraction with data augmentation
      • Believing optimizer changes are part of feature extraction
      2. Which PyTorch code snippet correctly freezes all layers of a pre-trained model except the final layer?
      easy
      A. for param in model.parameters(): param.requires_grad = True model.fc = nn.Linear(512, 10)
      B. model.fc.requires_grad = False for param in model.parameters(): param.requires_grad = True
      C. for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(512, 10)
      D. model.fc = nn.Linear(512, 10) for param in model.parameters(): param.requires_grad = False

      Solution

      1. Step 1: Freeze all layers by setting requires_grad to false

        The loop disables gradient updates for all parameters to keep pre-trained weights fixed.
      2. Step 2: Replace the final layer with a new one to train

        Assigning a new linear layer to model.fc allows training only this layer for the new task.
      3. Final Answer:

        for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(512, 10) -> Option C
      4. Quick Check:

        Freeze all except final layer = for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(512, 10) [OK]
      Hint: Freeze first, then replace final layer [OK]
      Common Mistakes:
      • Not freezing layers before replacing final layer
      • Freezing final layer instead of others
      • Setting requires_grad true for all parameters
      3. Given this PyTorch code for feature extraction, what will be the output shape of features?
      import torch
      import torchvision.models as models
      model = models.resnet18(pretrained=True)
      model.fc = torch.nn.Identity()
      input_tensor = torch.randn(4, 3, 224, 224)
      features = model(input_tensor)
      print(features.shape)
      medium
      A. torch.Size([4, 512])
      B. torch.Size([4, 1000])
      C. torch.Size([4, 3, 224, 224])
      D. torch.Size([4, 2048])

      Solution

      1. Step 1: Understand model modification

        Replacing model.fc with Identity removes the final classification layer, so output is the feature vector before classification.
      2. Step 2: Know ResNet18 feature size

        ResNet18 outputs a 512-dimensional vector before the final fc layer for each input image.
      3. Final Answer:

        torch.Size([4, 512]) -> Option A
      4. Quick Check:

        ResNet18 features = 512 dims [OK]
      Hint: Identity layer outputs feature vector size [OK]
      Common Mistakes:
      • Assuming output is 1000 classes without removing fc
      • Confusing batch size with feature dimension
      • Expecting 2048 features from ResNet18 (it's 512)
      4. Identify the error in this feature extraction code snippet and select the fix:
      model = models.resnet50(pretrained=True)
      for param in model.parameters():
          param.requires_grad = False
      model.fc = nn.Linear(2048, 5)
      optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
      
      # Training loop here
      medium
      A. No error; code is correct
      B. Set requires_grad=True for model.fc parameters after replacement
      C. Use Adam optimizer instead of SGD
      D. Remove freezing of parameters to train all layers

      Solution

      1. Step 1: Check freezing timing

        The loop freezes existing parameters before replacing model.fc, so the new fc layer's parameters are created with requires_grad=True by default.
      2. Step 2: Verify optimizer behavior

        Optimizer only updates parameters where requires_grad=True, which are the new fc parameters; backbone remains frozen.
      3. Final Answer:

        No error; code is correct -> Option A
      4. Quick Check:

        New layer params unfrozen by default [OK]
      Hint: New layers have requires_grad=True by default [OK]
      Common Mistakes:
      • Assuming freezing all parameters includes new layers
      • Changing optimizer without fixing requires_grad
      • Removing freezing unnecessarily
      5. You want to use a pre-trained ResNet34 to classify 3 classes in your dataset. You freeze all layers except the last one. However, your training accuracy stays very low. What is the best next step to improve feature extraction performance?
      hard
      A. Reduce batch size to 1 to improve gradient estimates
      B. Increase learning rate to 1.0 for faster training
      C. Replace the optimizer with SGD without momentum
      D. Unfreeze some deeper layers to fine-tune features for your task

      Solution

      1. Step 1: Understand freezing impact

        Freezing all but last layer may limit model's ability to adapt features to new classes, causing low accuracy.
      2. Step 2: Fine-tune some deeper layers

        Unfreezing some layers closer to output allows the model to adjust features better for your specific dataset.
      3. Final Answer:

        Unfreeze some deeper layers to fine-tune features for your task -> Option D
      4. Quick Check:

        Fine-tune layers = better adaptation [OK]
      Hint: Fine-tune layers if frozen model underperforms [OK]
      Common Mistakes:
      • Increasing learning rate too much causes instability
      • Changing optimizer without addressing feature adaptation
      • Reducing batch size unnecessarily