0
0
PyTorchml~12 mins

torchvision pre-trained models in PyTorch - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - torchvision pre-trained models

This pipeline uses a pre-trained model from torchvision to classify images. It starts with input images, processes them through the model, and outputs predicted labels with confidence scores.

Data Flow - 4 Stages
1Input Images
1000 images x 3 channels x 224 height x 224 widthRaw images loaded and resized to 224x224 pixels with 3 color channels (RGB)1000 images x 3 x 224 x 224
An image of a cat resized to 224x224 pixels with RGB channels
2Preprocessing
1000 images x 3 x 224 x 224Normalize pixel values using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]1000 images x 3 x 224 x 224
Normalized cat image pixels scaled to have zero mean and unit variance per channel
3Feature Extraction (Pre-trained Model)
1000 images x 3 x 224 x 224Pass images through ResNet-18 pre-trained on ImageNet to extract features and classify1000 images x 1000 classes
Model outputs a vector of 1000 class scores for each image
4Prediction
1000 images x 1000 classesApply softmax to convert class scores to probabilities1000 images x 1000 classes (probabilities)
For a cat image, highest probability might be for class 'tabby cat' with 0.85 confidence
Training Trace - Epoch by Epoch

Loss
1.0 |***************
0.8 |************
0.6 |********
0.4 |*****
0.2 |***
0.0 +----------------
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.750.65Initial training loss and accuracy using fine-tuning on a small dataset
20.500.78Loss decreased and accuracy improved after second epoch
30.350.85Model continues to learn, showing better predictions
40.280.89Training converging with lower loss and higher accuracy
50.220.92Final epoch shows good performance on training data
Prediction Trace - 5 Layers
Layer 1: Input Image
Layer 2: ResNet-18 Convolutional Layers
Layer 3: Fully Connected Layer
Layer 4: Softmax Activation
Layer 5: Prediction Output
Model Quiz - 3 Questions
Test your understanding
What is the shape of the output after the pre-trained model processes the input images?
A1000 images x 3 channels x 224 x 224
B1000 images x 1000 classes
C1000 images x 512 features
D1000 images x 25088 features
Key Insight
Using torchvision pre-trained models allows quick and effective image classification by leveraging knowledge learned from large datasets. The model processes images through layers extracting features and outputs class probabilities, improving accuracy as training progresses.