Bird
Raised Fist0
Computer Visionml~20 mins

Pre-trained models (ResNet, VGG, EfficientNet) in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Pre-trained models (ResNet, VGG, EfficientNet)
Problem:You want to classify images into 10 categories using a pre-trained model. Currently, you use a ResNet50 model fine-tuned on your dataset. The training accuracy is 98%, but validation accuracy is only 75%. This shows overfitting.
Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.85
Issue:The model overfits the training data, causing poor validation accuracy. The gap between training and validation accuracy is large.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.
You must use the ResNet50 pre-trained model as base.
You can only change hyperparameters like learning rate, batch size, epochs, and add regularization techniques.
Do not change the dataset or model architecture.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Computer Vision
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping

# Load pre-trained ResNet50 without top layer
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224,224,3))
base_model.trainable = False  # Freeze base model

# Add custom layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)  # Added dropout to reduce overfitting
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)

# Compile model with lower learning rate
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    validation_split=0.2
)

train_generator = train_datagen.flow_from_directory(
    'path_to_dataset',
    target_size=(224,224),
    batch_size=32,
    class_mode='categorical',
    subset='training'
)

val_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

validation_generator = val_datagen.flow_from_directory(
    'path_to_dataset',
    target_size=(224,224),
    batch_size=32,
    class_mode='categorical',
    subset='validation'
)

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Train model
history = model.fit(
    train_generator,
    epochs=30,
    validation_data=validation_generator,
    callbacks=[early_stop]
)
Added a Dropout layer with rate 0.5 after global average pooling to reduce overfitting.
Used data augmentation with rotation, shifts, and flips to increase training data variety.
Lowered learning rate from default to 0.0001 for smoother training.
Added EarlyStopping callback to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, large gap showing overfitting.

After: Training accuracy 90%, Validation accuracy 87%, smaller gap showing better generalization.

Adding dropout, data augmentation, lowering learning rate, and early stopping helps reduce overfitting. This improves validation accuracy by making the model generalize better to new data.
Bonus Experiment
Try using EfficientNetB0 pre-trained model instead of ResNet50 and compare validation accuracy.
💡 Hint
EfficientNet models are more efficient and may give better accuracy with fewer parameters. Use similar fine-tuning steps.

Practice

(1/5)
1. Which of the following is a key advantage of using pre-trained models like ResNet, VGG, or EfficientNet in computer vision tasks?
easy
A. They reduce the size of the input images automatically.
B. They save training time by using knowledge from large datasets.
C. They only work for text data, not images.
D. They always require training from scratch for every new task.

Solution

  1. Step 1: Understand what pre-trained models do

    Pre-trained models are trained on large datasets and learn useful features that can be reused.
  2. Step 2: Identify the benefit in context

    Using these models saves time because you don't need to train from scratch for every new task.
  3. Final Answer:

    They save training time by using knowledge from large datasets. -> Option B
  4. Quick Check:

    Pre-trained models save time = D [OK]
Hint: Pre-trained means already trained on big data [OK]
Common Mistakes:
  • Thinking pre-trained models need full retraining
  • Confusing image and text data applicability
  • Assuming input size changes automatically
2. Which of the following is the correct way to load a pre-trained ResNet model in PyTorch?
easy
A. model = torch.load('resnet50')
B. model = torchvision.load_resnet50()
C. model = torchvision.models.ResNet50(weights='imagenet')
D. model = torchvision.models.resnet50(pretrained=True)

Solution

  1. Step 1: Recall PyTorch syntax for loading pre-trained models

    In PyTorch, pre-trained models are loaded via torchvision.models with pretrained=True argument.
  2. Step 2: Check each option

    model = torchvision.models.resnet50(pretrained=True) uses correct function and argument. Others are incorrect or invalid syntax.
  3. Final Answer:

    model = torchvision.models.resnet50(pretrained=True) -> Option D
  4. Quick Check:

    PyTorch pre-trained flag = pretrained=True [OK]
Hint: Use pretrained=True in torchvision.models [OK]
Common Mistakes:
  • Using torch.load for model architecture
  • Wrong function names like load_resnet50
  • Incorrect argument names like weights='imagenet'
3. Consider this PyTorch code snippet using a pre-trained VGG16 model:
import torchvision.models as models
model = models.vgg16(pretrained=True)
print(type(model.features))
What will be the output type of model.features?
medium
A. <class 'torch.nn.Linear'>
B. <class 'torch.nn.ModuleList'>
C. <class 'torch.nn.Sequential'>
D. <class 'torch.nn.Conv2d'>

Solution

  1. Step 1: Understand VGG16 model structure in PyTorch

    VGG16's feature extractor is implemented as a torch.nn.Sequential container of layers.
  2. Step 2: Identify the type of model.features

    model.features groups convolutional layers in a Sequential module, so its type is torch.nn.Sequential.
  3. Final Answer:

    <class 'torch.nn.Sequential'> -> Option C
  4. Quick Check:

    VGG features = Sequential container [OK]
Hint: VGG features are in Sequential container [OK]
Common Mistakes:
  • Confusing Sequential with ModuleList
  • Thinking features is a single layer like Linear or Conv2d
  • Not knowing PyTorch container types
4. You try to fine-tune a pre-trained EfficientNet model but get an error: AttributeError: module 'torchvision.models' has no attribute 'efficientnet'. What is the most likely cause?
medium
A. Your torchvision version is outdated and does not include EfficientNet.
B. You forgot to import torch.
C. EfficientNet models are not available in PyTorch.
D. You need to set pretrained=True to access EfficientNet.

Solution

  1. Step 1: Understand the error message

    The error says torchvision.models has no attribute 'efficientnet', meaning the function is missing.
  2. Step 2: Check common causes

    EfficientNet was added in newer torchvision versions. An outdated version lacks it.
  3. Final Answer:

    Your torchvision version is outdated and does not include EfficientNet. -> Option A
  4. Quick Check:

    Missing attribute = outdated torchvision [OK]
Hint: Check torchvision version for model availability [OK]
Common Mistakes:
  • Assuming import torch fixes model availability
  • Thinking EfficientNet is not in PyTorch at all
  • Confusing pretrained flag with missing attribute
5. You want to build an image classifier for a small dataset with limited computing power. Which pre-trained model is the best choice to balance accuracy and efficiency?
hard
A. EfficientNet, because it scales well and is efficient for small data.
B. VGG16, because it is simple but very large and slow.
C. ResNet50, because it is very deep and accurate but heavy.
D. Train a new model from scratch for best results.

Solution

  1. Step 1: Consider dataset size and computing power

    Small data and limited power require efficient models to avoid overfitting and long training.
  2. Step 2: Compare model characteristics

    ResNet50 is accurate but heavy; VGG16 is large and slow; EfficientNet is designed for efficiency and good accuracy.
  3. Step 3: Choose the best fit

    EfficientNet balances accuracy and efficiency, making it ideal for small datasets and limited resources.
  4. Final Answer:

    EfficientNet, because it scales well and is efficient for small data. -> Option A
  5. Quick Check:

    Efficiency + accuracy = EfficientNet [OK]
Hint: EfficientNet balances speed and accuracy well [OK]
Common Mistakes:
  • Choosing heavy models for small data
  • Ignoring efficiency for limited computing power
  • Thinking training from scratch is always better