0
0
Computer Visionml~12 mins

Geometric transforms (rotate, flip, crop) in Computer Vision - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Geometric transforms (rotate, flip, crop)

This pipeline shows how images are changed step-by-step using geometric transforms like rotate, flip, and crop. These changes help models learn better by seeing different views of the same image.

Data Flow - 4 Stages
1Input Image
1 image x 256 height x 256 width x 3 channelsOriginal image loaded1 image x 256 height x 256 width x 3 channels
A colorful photo of a cat
2Rotate
1 image x 256 height x 256 width x 3 channelsRotate image 90 degrees clockwise1 image x 256 height x 256 width x 3 channels
Cat image rotated so head points right instead of up
3Flip
1 image x 256 height x 256 width x 3 channelsFlip image horizontally (left to right)1 image x 256 height x 256 width x 3 channels
Cat image mirrored so left ear is now on right
4Crop
1 image x 256 height x 256 width x 3 channelsCrop center 128x128 pixels1 image x 128 height x 128 width x 3 channels
Zoomed-in cat face in center of image
Training Trace - Epoch by Epoch
Loss
1.0 |          *
0.8 |         * 
0.6 |       *   
0.4 |     *     
0.2 |   *       
0.0 +-----------
      1 2 3 4 5
       Epochs
EpochLoss ↓Accuracy ↑Observation
10.850.55Model starts learning with high loss and low accuracy
20.650.70Loss decreases and accuracy improves after seeing transformed images
30.500.80Model learns better features from augmented images
40.400.85Loss continues to drop, accuracy rises steadily
50.350.88Model converges with good accuracy using geometric transforms
Prediction Trace - 5 Layers
Layer 1: Input Image
Layer 2: Rotate 90 degrees clockwise
Layer 3: Flip horizontally
Layer 4: Crop center 128x128
Layer 5: Model prediction
Model Quiz - 3 Questions
Test your understanding
What happens to the image size after cropping?
AIt becomes larger
BIt stays the same size
CIt becomes smaller in height and width
DIt loses color channels
Key Insight
Using geometric transforms like rotate, flip, and crop helps the model see different views of the same image. This variety teaches the model to recognize objects better, improving accuracy and reducing overfitting.