Pre-trained models help you use ready-made AI models that already learned from lots of images. This saves time and effort when building your own image tasks.
torchvision pre-trained models in PyTorch
Start learning this pattern below
Jump into concepts and practice - no test required
import torchvision.models as models model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT) model.eval()
Set weights=models.ResNet18_Weights.DEFAULT to load a model with weights learned on ImageNet data.
Call model.eval() to set the model to evaluation mode before using it for predictions.
import torchvision.models as models # Load a pre-trained ResNet18 model model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT) model.eval()
import torchvision.models as models # Load a pre-trained MobileNetV2 model model = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.DEFAULT) model.eval()
import torchvision.models as models # Load a pre-trained DenseNet121 model model = models.densenet121(weights=models.DenseNet121_Weights.DEFAULT) model.eval()
This program loads a pre-trained ResNet18 model, downloads an image of a dog, preprocesses it, and predicts the class with confidence.
import torch from torchvision import models, transforms from PIL import Image import requests # Load a pre-trained ResNet18 model model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT) model.eval() # Define image transforms to prepare input preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) # Download an example image url = 'https://upload.wikimedia.org/wikipedia/commons/9/9a/Pug_600.jpg' image = Image.open(requests.get(url, stream=True).raw) # Preprocess the image input_tensor = preprocess(image) input_batch = input_tensor.unsqueeze(0) # create batch dimension # Run the model on the input with torch.no_grad(): output = model(input_batch) # Load ImageNet class names labels_url = 'https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt' labels = requests.get(labels_url).text.splitlines() # Get the predicted class index probabilities = torch.nn.functional.softmax(output[0], dim=0) confidence, class_idx = torch.max(probabilities, dim=0) # Print the result print(f'Predicted class: {labels[class_idx]}') print(f'Confidence: {confidence.item():.4f}')
Pre-trained models are trained on ImageNet with 1000 classes, so predictions match those classes.
Always preprocess images the same way the model expects (resize, crop, normalize).
Use model.eval() to turn off training features like dropout for correct predictions.
Pre-trained models let you use powerful image models without training from scratch.
They are great for quick experiments, small datasets, or as a starting point for your own tasks.
Remember to preprocess images correctly and set the model to evaluation mode before predicting.
Practice
torchvision pre-trained models?Solution
Step 1: Understand what pre-trained models do
Pre-trained models are already trained on large datasets, so you don't need to train them from zero.Step 2: Identify the main benefit
This saves time and resources, letting you use powerful models quickly.Final Answer:
They allow you to use powerful image models without training from scratch. -> Option DQuick Check:
Pre-trained models = reuse trained weights [OK]
- Thinking pre-trained models improve data quality
- Confusing pre-trained models with image resizing
- Believing they generate images from text
Solution
Step 1: Recall the updated torchvision syntax
Since torchvision 0.13+, pre-trained weights are loaded using the 'weights' argument with a weights enum, not 'pretrained=True'.Step 2: Identify the correct syntax for ResNet18
Usetorchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1).Final Answer:
model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1) -> Option AQuick Check:
Use weights=enum, not pretrained=True [OK]
- Using pretrained=False which doesn't load pre-trained weights
- Calling torchvision.resnet18 directly
- Using a non-existent load_resnet18 function
import torch import torchvision.models as models model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1) model.eval() inputs = torch.randn(8, 3, 224, 224) outputs = model(inputs) print(outputs.shape)
Solution
Step 1: Understand ResNet18 output size
ResNet18 pre-trained on ImageNet outputs logits for 1000 classes, so output shape is (batch_size, 1000).Step 2: Check input batch size and output shape
Input batch size is 8, so output shape is (8, 1000).Final Answer:
torch.Size([8, 1000]) -> Option BQuick Check:
Batch size 8, 1000 classes output [OK]
- Confusing output with input image shape
- Expecting feature vector size instead of class logits
- Assuming batch size 1 output
model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1) inputs = torch.randn(1, 3, 224, 224) outputs = model(inputs)What is the likely mistake?
Solution
Step 1: Check model mode for prediction
Pre-trained models must be set to evaluation mode withmodel.eval()to disable dropout and batch norm updates.Step 2: Identify the missing step
The code missesmodel.eval(), so outputs may be incorrect or inconsistent.Final Answer:
You forgot to callmodel.eval()before prediction. -> Option CQuick Check:
Set model.eval() before inference [OK]
- Not calling model.eval() before inference
- Wrong input tensor shape without batch
- Trying to convert tensors to numpy before model
Solution
Step 1: Identify the final layer of ResNet18
ResNet18's final fully connected layer ismodel.fcwith input features 512 and output 1000 classes.Step 2: Replace final layer for 5 classes
To fine-tune, replacemodel.fcwith a new Linear layer with 512 inputs and 5 outputs.Final Answer:
model.fc = torch.nn.Linear(in_features=512, out_features=5) -> Option AQuick Check:
Replace model.fc with correct output size [OK]
- Replacing wrong attribute like model.classifier
- Using wrong input feature size (2048 instead of 512)
- Not changing output features to dataset classes
