Computer-visionHow-ToBeginner · 3 min read

How to Use ResNet for Image Classification in Computer Vision

To use ResNet for classification in computer vision, load a pretrained ResNet model, replace its final layer to match your number of classes, and train or fine-tune it on your dataset. Use frameworks like PyTorch to easily load and modify ResNet architectures for your classification task.

📐

Syntax

Here is the typical syntax to use a pretrained ResNet model for classification:

torchvision.models.resnet50(weights=models.ResNet50_Weights.DEFAULT): Loads ResNet-50 with pretrained weights.
Replace the final fully connected layer model.fc to match your number of classes.
Use an optimizer and loss function to train or fine-tune the model.

python

import torch
import torchvision.models as models

# Load pretrained ResNet-50
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)

# Replace final layer for 10 classes
model.fc = torch.nn.Linear(model.fc.in_features, 10)

💻

Example

This example shows how to load ResNet-18 pretrained on ImageNet, replace its final layer for 3 classes, and run a forward pass on a dummy image tensor.

python

import torch
import torchvision.models as models

# Load pretrained ResNet-18
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

# Replace final layer for 3 classes
model.fc = torch.nn.Linear(model.fc.in_features, 3)

# Create dummy input (batch size 1, 3 color channels, 224x224 image)
dummy_input = torch.randn(1, 3, 224, 224)

# Forward pass
output = model(dummy_input)
print('Output shape:', output.shape)
print('Output values:', output)

Output

Output shape: torch.Size([1, 3]) Output values: tensor([[ 0.1234, -0.5678, 0.9101]], grad_fn=<AddmmBackward0>)

⚠️

Common Pitfalls

Not replacing the final layer to match your dataset's number of classes causes shape mismatch errors.
Forgetting to set the model to train() mode during training or eval() mode during evaluation affects batch normalization and dropout layers.
Not normalizing input images with the same mean and std used during ResNet training leads to poor performance.
Using a learning rate too high can cause training to diverge when fine-tuning.

python

import torch
import torchvision.models as models

# Wrong: Using pretrained ResNet without changing final layer for 5 classes
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
# model.fc still outputs 1000 classes (ImageNet)

# Right: Replace final layer
model.fc = torch.nn.Linear(model.fc.in_features, 5)

# Set model to train mode
model.train()

📊

Quick Reference

Step	Description
Load pretrained ResNet	Use torchvision.models.resnetXX(weights=models.ResNetXX_Weights.DEFAULT)
Modify final layer	Replace model.fc with Linear(in_features, num_classes)
Prepare data	Normalize images with ImageNet mean and std
Train or fine-tune	Use optimizer and loss function, set model.train()
Evaluate	Set model.eval() and disable gradients

✅

Key Takeaways

Always replace ResNet's final layer to match your classification classes.

Normalize input images using ImageNet mean and standard deviation.

Set model.train() during training and model.eval() during evaluation.

Use pretrained weights to speed up training and improve accuracy.

Watch learning rate carefully when fine-tuning to avoid training issues.