0
0
PyTorchml~20 mins

torchvision pre-trained models in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - torchvision pre-trained models
Problem:You want to classify images into categories using a deep learning model. You are using a torchvision pre-trained model (ResNet18) on a small custom dataset. The model trains quickly and achieves 98% accuracy on the training set but only 70% accuracy on the validation set.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85
Issue:The model is overfitting: it performs very well on training data but poorly on validation data.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You must use the torchvision pre-trained ResNet18 model.
You can only modify training hyperparameters and add regularization techniques.
Do not change the dataset or model architecture drastically.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader

# Data augmentation and normalization for training
train_transforms = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Normalization for validation
val_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Load datasets
train_dataset = datasets.FakeData(transform=train_transforms)  # Replace with real dataset
val_dataset = datasets.FakeData(transform=val_transforms)      # Replace with real dataset

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

# Load pretrained ResNet18
model = models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Sequential(
    nn.Dropout(0.5),  # Added dropout
    nn.Linear(num_ftrs, 10)  # Assuming 10 classes
)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0005, weight_decay=1e-4)  # Added weight decay

num_epochs = 10
best_val_acc = 0.0

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    running_corrects = 0
    total = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * inputs.size(0)
        _, preds = torch.max(outputs, 1)
        running_corrects += torch.sum(preds == labels.data)
        total += labels.size(0)
    train_loss = running_loss / total
    train_acc = running_corrects.double() / total

    model.eval()
    val_loss = 0.0
    val_corrects = 0
    val_total = 0
    with torch.no_grad():
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, preds = torch.max(outputs, 1)
            val_corrects += torch.sum(preds == labels.data)
            val_total += labels.size(0)
    val_loss /= val_total
    val_acc = val_corrects.double() / val_total

    if val_acc > best_val_acc:
        best_val_acc = val_acc

    print(f'Epoch {epoch+1}/{num_epochs} - '
          f'Train loss: {train_loss:.4f}, Train acc: {train_acc:.4f} - '
          f'Val loss: {val_loss:.4f}, Val acc: {val_acc:.4f}')
Added data augmentation to training data to increase variety.
Added dropout layer before the final fully connected layer to reduce overfitting.
Used weight decay (L2 regularization) in the Adam optimizer.
Reduced learning rate to 0.0005 for smoother training.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.40

Adding dropout, weight decay, and data augmentation helps reduce overfitting. The model generalizes better, improving validation accuracy while slightly lowering training accuracy.
Bonus Experiment
Try using a different pretrained model like MobileNetV2 and compare the validation accuracy and training time.
💡 Hint
MobileNetV2 is lighter and faster. Adjust learning rate and batch size accordingly.