PyTorchml~20 mins

Why pre-trained models accelerate development in PyTorch - Experiment to Prove It

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Why pre-trained models accelerate development

Problem:You want to build an image classifier but training a model from scratch takes a long time and needs a lot of data.

Current Metrics:Training accuracy: 95%, Validation accuracy: 60%, Training loss: 0.15, Validation loss: 1.2

Issue:The model overfits the training data and performs poorly on new images because it has not learned general features well.

Your Task

Use a pre-trained model to improve validation accuracy to above 85% while reducing training time.

You must use PyTorch and a pre-trained model from torchvision.

You can only fine-tune the last layer of the model.

Do not change the dataset or add more data.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader

# Data transforms
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load dataset (example: CIFAR-10)
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
val_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Load pre-trained ResNet18
model = models.resnet18(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace the final layer
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 10)  # CIFAR-10 has 10 classes

# Only parameters of final layer are trainable
params_to_update = model.fc.parameters()

# Use GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(params_to_update, lr=0.001)

# Training loop
for epoch in range(5):  # small number of epochs for demo
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    train_loss = running_loss / total
    train_acc = 100 * correct / total

    # Validation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs, 1)
            val_total += labels.size(0)
            val_correct += (predicted == labels).sum().item()

    val_loss /= val_total
    val_acc = 100 * val_correct / val_total

    print(f'Epoch {epoch+1}: Train Loss={train_loss:.4f}, Train Acc={train_acc:.2f}%, Val Loss={val_loss:.4f}, Val Acc={val_acc:.2f}%')

Loaded a pre-trained ResNet18 model instead of training from scratch.

Froze all layers except the last fully connected layer.

Replaced the last layer to match the number of classes in the dataset.

Trained only the last layer with a small learning rate.

Used standard image normalization and resizing to fit the pre-trained model input.

Results Interpretation

Before: Training accuracy 95%, Validation accuracy 60%, high overfitting.

After: Training accuracy 88%, Validation accuracy 87%, much better generalization.

Using a pre-trained model helps the model learn general features from a large dataset. Fine-tuning only the last layer reduces training time and overfitting, improving validation accuracy quickly.

Bonus Experiment

Try unfreezing some of the earlier layers and fine-tune more layers to see if validation accuracy improves further.

💡 Hint

Unfreeze the last few layers and use a smaller learning rate for them to avoid destroying learned features.

Practice

(1/5)

1. Why do pre-trained models help speed up AI development in PyTorch?

easy

A. They always produce perfect results without any training.

B. They start with knowledge learned from other data, reducing training time.

C. They require more data to train from scratch.

D. They avoid the need for any coding or model building.

Why pre-trained models accelerate development in PyTorch - Experiment to Prove It

Start learning this pattern below

Practice

Solution

Step 1: Understand pre-trained model concept

Step 2: Relate to training time

Final Answer:

Quick Check:

Solution

Step 1: Check PyTorch's current API for loading pre-trained models

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand ResNet50 default output

Step 2: Fine-tuning changes final layer output size

Final Answer:

Quick Check:

Solution

Step 1: Identify cause of shape mismatch error

Step 2: Relate to fine-tuning process

Final Answer:

Quick Check:

Solution

Step 1: Understand constraints of small data and limited GPU

Step 2: Explain benefit of fine-tuning pre-trained models

Step 3: Why other options are incorrect

Final Answer:

Quick Check: