0
0
PyTorchml~5 mins

Fine-tuning strategy in PyTorch

Choose your learning style9 modes available
Introduction

Fine-tuning helps a pre-trained model learn new tasks faster by adjusting it slightly instead of starting from scratch.

You want to teach a model to recognize new types of images but have limited data.
You have a language model trained on general text and want it to work well on medical documents.
You want to improve a speech recognition model for a specific accent.
You want to save time and computing power by building on an existing model.
You want to customize a chatbot to understand your company's products better.
Syntax
PyTorch
import torchvision.models as models
model = models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
model.fc = torch.nn.Linear(model.fc.in_features, num_classes)
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)

# Training loop only updates model.fc parameters

Set requires_grad = False to freeze layers you don't want to change.

Replace the last layer to match your new task's output classes.

Examples
Freeze all layers except the last fully connected layer for a 10-class problem.
PyTorch
import torchvision.models as models
model = models.resnet50(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
model.fc = torch.nn.Linear(model.fc.in_features, 10)  # 10 classes
optimizer = torch.optim.SGD(model.fc.parameters(), lr=0.01)
Fine-tune only the last block of layers (layer4) while freezing the rest.
PyTorch
for name, param in model.named_parameters():
    if 'layer4' in name:
        param.requires_grad = True
    else:
        param.requires_grad = False
Only optimize parameters that require gradients (unfrozen layers).
PyTorch
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=0.0001)
Sample Model

This code fine-tunes only the last layer of a ResNet18 model on a small dummy dataset with 3 classes.

PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models

# Load pre-trained model
model = models.resnet18(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace the last layer for 3 classes
num_classes = 3
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Only parameters of final layer will be updated
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

# Dummy input and target
inputs = torch.randn(5, 3, 224, 224)
targets = torch.tensor([0, 1, 2, 1, 0])

# Loss function
criterion = nn.CrossEntropyLoss()

# Training step
model.train()
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

# Print loss and predictions
print(f"Loss: {loss.item():.4f}")
_, preds = torch.max(outputs, 1)
print(f"Predictions: {preds.tolist()}")
OutputSuccess
Important Notes

Freezing layers helps keep learned features and reduces training time.

Fine-tuning too many layers with little data can cause overfitting.

Adjust learning rate carefully; often smaller rates work better for fine-tuning.

Summary

Fine-tuning adjusts a pre-trained model to a new task by training some layers.

Freeze layers you want to keep unchanged by setting requires_grad = False.

Replace the last layer to match your new task's output classes and train only it or a few layers.