Bird
Raised Fist0
PyTorchml~20 mins

torchvision pre-trained models in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - torchvision pre-trained models
Problem:You want to classify images into categories using a deep learning model. You are using a torchvision pre-trained model (ResNet18) on a small custom dataset. The model trains quickly and achieves 98% accuracy on the training set but only 70% accuracy on the validation set.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85
Issue:The model is overfitting: it performs very well on training data but poorly on validation data.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You must use the torchvision pre-trained ResNet18 model.
You can only modify training hyperparameters and add regularization techniques.
Do not change the dataset or model architecture drastically.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader

# Data augmentation and normalization for training
train_transforms = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Normalization for validation
val_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load datasets
train_dataset = datasets.FakeData(transform=train_transforms)  # Replace with real dataset
val_dataset = datasets.FakeData(transform=val_transforms)      # Replace with real dataset

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Load pretrained ResNet18
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Sequential(
    nn.Dropout(0.5),  # Added dropout
    nn.Linear(num_ftrs, 10)  # Assuming 10 classes
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0005, weight_decay=1e-4)  # Added weight decay

num_epochs = 10
best_val_acc = 0.0

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    running_corrects = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        _, preds = torch.max(outputs, 1)
        running_corrects += torch.sum(preds == labels.data)
        total += labels.size(0)
    train_loss = running_loss / total
    train_acc = running_corrects.double() / total

    model.eval()
    val_loss = 0.0
    val_corrects = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, preds = torch.max(outputs, 1)
            val_corrects += torch.sum(preds == labels.data)
            val_total += labels.size(0)
    val_loss /= val_total
    val_acc = val_corrects.double() / val_total

    if val_acc > best_val_acc:
        best_val_acc = val_acc

    print(f'Epoch {epoch+1}/{num_epochs} - '
          f'Train loss: {train_loss:.4f}, Train acc: {train_acc:.4f} - '
          f'Val loss: {val_loss:.4f}, Val acc: {val_acc:.4f}')
Added data augmentation to training data to increase variety.
Added dropout layer before the final fully connected layer to reduce overfitting.
Used weight decay (L2 regularization) in the Adam optimizer.
Reduced learning rate to 0.0005 for smoother training.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.40

Adding dropout, weight decay, and data augmentation helps reduce overfitting. The model generalizes better, improving validation accuracy while slightly lowering training accuracy.
Bonus Experiment
Try using a different pretrained model like MobileNetV2 and compare the validation accuracy and training time.
💡 Hint
MobileNetV2 is lighter and faster. Adjust learning rate and batch size accordingly.

Practice

(1/5)
1. What is the main advantage of using torchvision pre-trained models?
easy
A. They automatically improve your dataset quality.
B. They generate new images from text descriptions.
C. They reduce the size of your images.
D. They allow you to use powerful image models without training from scratch.

Solution

  1. Step 1: Understand what pre-trained models do

    Pre-trained models are already trained on large datasets, so you don't need to train them from zero.
  2. Step 2: Identify the main benefit

    This saves time and resources, letting you use powerful models quickly.
  3. Final Answer:

    They allow you to use powerful image models without training from scratch. -> Option D
  4. Quick Check:

    Pre-trained models = reuse trained weights [OK]
Hint: Pre-trained means ready to use without full training [OK]
Common Mistakes:
  • Thinking pre-trained models improve data quality
  • Confusing pre-trained models with image resizing
  • Believing they generate images from text
2. Which of the following is the correct way to load a pre-trained ResNet18 model from torchvision?
easy
A. model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1)
B. model = torchvision.resnet18(pretrained=True)
C. model = torchvision.models.resnet18(pretrained=False)
D. model = torchvision.models.load_resnet18(pretrained=True)

Solution

  1. Step 1: Recall the updated torchvision syntax

    Since torchvision 0.13+, pre-trained weights are loaded using the 'weights' argument with a weights enum, not 'pretrained=True'.
  2. Step 2: Identify the correct syntax for ResNet18

    Use torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1).
  3. Final Answer:

    model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1) -> Option A
  4. Quick Check:

    Use weights=enum, not pretrained=True [OK]
Hint: Use weights= argument, not pretrained=True [OK]
Common Mistakes:
  • Using pretrained=False which doesn't load pre-trained weights
  • Calling torchvision.resnet18 directly
  • Using a non-existent load_resnet18 function
3. What will be the output shape of the following code snippet using a pre-trained ResNet18 model on a batch of 8 RGB images of size 224x224?
import torch
import torchvision.models as models
model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
model.eval()
inputs = torch.randn(8, 3, 224, 224)
outputs = model(inputs)
print(outputs.shape)
medium
A. torch.Size([8, 3, 224, 224])
B. torch.Size([8, 1000])
C. torch.Size([8, 512])
D. torch.Size([1, 1000])

Solution

  1. Step 1: Understand ResNet18 output size

    ResNet18 pre-trained on ImageNet outputs logits for 1000 classes, so output shape is (batch_size, 1000).
  2. Step 2: Check input batch size and output shape

    Input batch size is 8, so output shape is (8, 1000).
  3. Final Answer:

    torch.Size([8, 1000]) -> Option B
  4. Quick Check:

    Batch size 8, 1000 classes output [OK]
Hint: Output shape = (batch_size, number_of_classes) [OK]
Common Mistakes:
  • Confusing output with input image shape
  • Expecting feature vector size instead of class logits
  • Assuming batch size 1 output
4. You wrote this code to use a pre-trained model for prediction but get wrong results:
model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1)
inputs = torch.randn(1, 3, 224, 224)
outputs = model(inputs)
What is the likely mistake?
medium
A. Input tensor shape should be (3, 224, 224) without batch dimension.
B. You need to set weights=None to use pre-trained weights.
C. You forgot to call model.eval() before prediction.
D. You must convert inputs to numpy arrays before passing to model.

Solution

  1. Step 1: Check model mode for prediction

    Pre-trained models must be set to evaluation mode with model.eval() to disable dropout and batch norm updates.
  2. Step 2: Identify the missing step

    The code misses model.eval(), so outputs may be incorrect or inconsistent.
  3. Final Answer:

    You forgot to call model.eval() before prediction. -> Option C
  4. Quick Check:

    Set model.eval() before inference [OK]
Hint: Always call model.eval() before predicting [OK]
Common Mistakes:
  • Not calling model.eval() before inference
  • Wrong input tensor shape without batch
  • Trying to convert tensors to numpy before model
5. You want to fine-tune a pre-trained ResNet18 model on your own 5-class dataset. Which of the following code snippets correctly replaces the final layer for this task?
hard
A. model.fc = torch.nn.Linear(in_features=512, out_features=5)
B. model.classifier = torch.nn.Linear(in_features=1000, out_features=5)
C. model.fc = torch.nn.Linear(in_features=2048, out_features=5)
D. model.output = torch.nn.Linear(in_features=512, out_features=1000)

Solution

  1. Step 1: Identify the final layer of ResNet18

    ResNet18's final fully connected layer is model.fc with input features 512 and output 1000 classes.
  2. Step 2: Replace final layer for 5 classes

    To fine-tune, replace model.fc with a new Linear layer with 512 inputs and 5 outputs.
  3. Final Answer:

    model.fc = torch.nn.Linear(in_features=512, out_features=5) -> Option A
  4. Quick Check:

    Replace model.fc with correct output size [OK]
Hint: Replace model.fc with Linear(512, number_of_classes) [OK]
Common Mistakes:
  • Replacing wrong attribute like model.classifier
  • Using wrong input feature size (2048 instead of 512)
  • Not changing output features to dataset classes