Computer Visionml~12 mins

Geometric transforms (rotate, flip, crop) in Computer Vision - Model Pipeline Trace

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Model Pipeline - Geometric transforms (rotate, flip, crop)

This pipeline shows how images are changed step-by-step using geometric transforms like rotate, flip, and crop. These changes help models learn better by seeing different views of the same image.

Data Flow - 4 Stages

1Input Image

1 image x 256 height x 256 width x 3 channels→Original image loaded→1 image x 256 height x 256 width x 3 channels

A colorful photo of a cat

↓

2Rotate

1 image x 256 height x 256 width x 3 channels→Rotate image 90 degrees clockwise→1 image x 256 height x 256 width x 3 channels

Cat image rotated so head points right instead of up

↓

3Flip

1 image x 256 height x 256 width x 3 channels→Flip image horizontally (left to right)→1 image x 256 height x 256 width x 3 channels

Cat image mirrored so left ear is now on right

↓

4Crop

1 image x 256 height x 256 width x 3 channels→Crop center 128x128 pixels→1 image x 128 height x 128 width x 3 channels

Zoomed-in cat face in center of image

Training Trace - Epoch by Epoch

Loss
1.0 |          *
0.8 |         * 
0.6 |       *   
0.4 |     *     
0.2 |   *       
0.0 +-----------
      1 2 3 4 5
       Epochs

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.85	0.55	Model starts learning with high loss and low accuracy
2	0.65	0.70	Loss decreases and accuracy improves after seeing transformed images
3	0.50	0.80	Model learns better features from augmented images
4	0.40	0.85	Loss continues to drop, accuracy rises steadily
5	0.35	0.88	Model converges with good accuracy using geometric transforms

Prediction Trace - 5 Layers

Layer 1: Input Image

Layer 2: Rotate 90 degrees clockwise

Layer 3: Flip horizontally

Layer 4: Crop center 128x128

Layer 5: Model prediction

Model Quiz - 3 Questions

Test your understanding

What happens to the image size after cropping?

AIt becomes larger

BIt stays the same size

CIt becomes smaller in height and width

DIt loses color channels

Key Insight

Using geometric transforms like rotate, flip, and crop helps the model see different views of the same image. This variety teaches the model to recognize objects better, improving accuracy and reducing overfitting.