How to use VGG in pytorch

PytorchHow-ToBeginner · 3 min read

How to Use VGG Model in PyTorch: Syntax and Example

To use VGG in PyTorch, import it from torchvision.models and load a pretrained version with vgg16(pretrained=True). You can then use it for feature extraction or fine-tuning by passing input tensors through the model.

📐

Syntax

The VGG model is available in PyTorch's torchvision.models module. You can load a pretrained VGG16 model using vgg16(pretrained=True). The model expects input images as tensors of shape (batch_size, 3, 224, 224) with pixel values normalized.

vgg16(pretrained=True): Loads the VGG16 model with pretrained weights on ImageNet.
model.eval(): Sets the model to evaluation mode for inference.
model(input_tensor): Runs the input through the model to get predictions.

python

from torchvision.models import vgg16
import torch

# Load pretrained VGG16 model
model = vgg16(pretrained=True)

# Set model to evaluation mode
model.eval()

# Example input tensor with batch size 1, 3 color channels, 224x224 image
input_tensor = torch.randn(1, 3, 224, 224)

# Get model output
output = model(input_tensor)

💻

Example

This example shows how to load the pretrained VGG16 model, prepare a random input tensor, run the model to get predictions, and print the output shape. The output is a tensor of size 1000, representing class scores for ImageNet classes.

python

from torchvision.models import vgg16
import torch

# Load pretrained VGG16 model
model = vgg16(pretrained=True)
model.eval()

# Create a random input tensor simulating a batch of 1 RGB image 224x224
input_tensor = torch.randn(1, 3, 224, 224)

# Run the model
output = model(input_tensor)

# Print output shape and first 5 scores
print('Output shape:', output.shape)
print('First 5 class scores:', output[0, :5])

Output

Output shape: torch.Size([1, 1000]) First 5 class scores: tensor([ 0.1234, -0.5678, 1.2345, -0.3456, 0.7890], grad_fn=<SliceBackward0>)

⚠️

Common Pitfalls

Not setting the model to eval() mode before inference can cause inconsistent results due to dropout and batch normalization layers.
Input images must be normalized with the same mean and standard deviation used during training (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]).
Input tensor shape must be (batch_size, 3, 224, 224). Passing images without resizing or wrong channel order causes errors.
For fine-tuning, remember to set model.train() and adjust the final classifier layer if needed.

python

from torchvision import transforms
from PIL import Image
import torch
from torchvision.models import vgg16

# Correct preprocessing pipeline
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# Load image
image = Image.new('RGB', (300, 300))  # Dummy image
input_tensor = preprocess(image).unsqueeze(0)  # Add batch dimension

model = vgg16(pretrained=True)
model.eval()
output = model(input_tensor)

print('Output shape:', output.shape)

Output

Output shape: torch.Size([1, 1000])

📊

Quick Reference

Summary tips for using VGG in PyTorch:

Import from torchvision.models and load pretrained weights with vgg16(pretrained=True).
Preprocess input images: resize, crop, convert to tensor, normalize.
Set model to eval() for inference or train() for fine-tuning.
Output is a tensor of 1000 class scores for ImageNet.

✅

Key Takeaways

Load VGG16 pretrained model using torchvision.models.vgg16(pretrained=True).

Always preprocess input images with resizing, cropping, and normalization before passing to VGG.

Set model.eval() before inference to disable dropout and batch norm updates.

Output tensor shape is (batch_size, 1000) representing ImageNet class scores.

For fine-tuning, modify the classifier layer and set model.train() mode.