How to Use Transfer Learning in PyTorch: Simple Guide
To use
transfer learning in PyTorch, load a pretrained model from torchvision.models, freeze its early layers to keep learned features, and replace the final layer to match your task. Then, train only the new layers or fine-tune the whole model with your dataset.Syntax
Transfer learning in PyTorch typically involves these steps:
- Load a pretrained model from
torchvision.models. - Freeze layers by setting
param.requires_grad = Falseto keep learned features. - Replace the final classification layer to fit your number of classes.
- Define optimizer and loss, then train the model.
python
import torch import torchvision.models as models # Load pretrained model model = models.resnet18(pretrained=True) # Freeze all layers for param in model.parameters(): param.requires_grad = False # Replace the final layer num_features = model.fc.in_features num_classes = 10 # num_classes is your target count model.fc = torch.nn.Linear(num_features, num_classes) # Only parameters of final layer will be updated optimizer = torch.optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9) # Define loss criterion = torch.nn.CrossEntropyLoss()
Example
This example shows how to use transfer learning with ResNet18 on a custom dataset with 2 classes. It freezes pretrained layers and trains only the final layer.
python
import torch import torchvision.models as models import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import DataLoader # Setup device device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Data transforms transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) # Load example dataset (replace with your own) dataset = datasets.FakeData(transform=transform) dataloader = DataLoader(dataset, batch_size=4, shuffle=True) # Load pretrained ResNet18 model = models.resnet18(pretrained=True) # Freeze all layers for param in model.parameters(): param.requires_grad = False # Replace final layer for 2 classes num_features = model.fc.in_features model.fc = nn.Linear(num_features, 2) model = model.to(device) # Only train final layer optimizer = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9) criterion = nn.CrossEntropyLoss() # Training loop (1 epoch for demo) model.train() for inputs, labels in dataloader: inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() print(f'Loss: {loss.item():.4f}') break # run one batch for demo
Output
Loss: 0.6931
Common Pitfalls
Common mistakes when using transfer learning in PyTorch include:
- Not freezing pretrained layers, causing slow training and overfitting.
- Forgetting to replace the final layer to match your number of classes.
- Passing all model parameters to the optimizer instead of only trainable ones.
- Not normalizing input images with the same mean and std used in pretrained models.
python
import torchvision.models as models model = models.resnet18(pretrained=True) # Wrong: Not freezing layers # optimizer = torch.optim.SGD(model.parameters(), lr=0.001) # trains all layers # Right: Freeze layers for param in model.parameters(): param.requires_grad = False num_features = model.fc.in_features model.fc = torch.nn.Linear(num_features, 10) # example 10 classes # Only train final layer optimizer = torch.optim.SGD(model.fc.parameters(), lr=0.001)
Quick Reference
Summary tips for transfer learning in PyTorch:
- Use
torchvision.modelsto get pretrained models. - Freeze early layers by setting
param.requires_grad = False. - Replace the final layer to match your task's classes.
- Normalize inputs with pretrained model's mean and std.
- Train only the new layers or fine-tune by unfreezing some layers later.
Key Takeaways
Load a pretrained model and freeze its layers to keep learned features.
Replace the final layer to fit your number of output classes.
Train only the new layers initially for faster and stable training.
Normalize input images with the pretrained model's expected mean and std.
Fine-tune by unfreezing layers later if higher accuracy is needed.