Computer Visionml~8 mins

Geometric transforms (rotate, flip, crop) in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Geometric transforms (rotate, flip, crop)

Which metric matters for Geometric Transforms and WHY

When using geometric transforms like rotate, flip, and crop in computer vision, the key metric to watch is model accuracy or performance on validation data. This is because these transforms change the input images to help the model learn better. We want to see if these changes improve the model's ability to recognize objects correctly.

Also, robustness is important. This means the model should still perform well even if images are rotated or flipped in real life. So, metrics like accuracy, precision, recall, or F1 score on transformed images tell us if the model learned well from these geometric changes.

Confusion Matrix Example

Suppose we have a model classifying images after applying flips and rotations. Here is a confusion matrix showing results on a test set:

      | Predicted Cat | Predicted Dog |
      |---------------|---------------|
      | True Cat: 45  | 5             |
      | True Dog: 7   | 43            |

From this matrix:

True Positives (TP) for Cat = 45
False Positives (FP) for Cat = 5
False Negatives (FN) for Cat = 7
True Negatives (TN) for Cat = 43

We can calculate precision and recall to check if the model learned well with geometric transforms.

Precision vs Recall Tradeoff with Geometric Transforms

Imagine a model trained with rotated and flipped images to detect cats. If the model is very strict, it might only label very clear cat images as cats, leading to:

High precision: Most predicted cats are really cats.
Low recall: It misses some cats that look different due to rotation or flip.

On the other hand, if the model tries to catch every possible cat, it might label some dogs as cats, leading to:

High recall: Most cats are found.
Low precision: More wrong cat labels.

Geometric transforms help balance this by teaching the model to recognize cats in many positions, improving both precision and recall.

What Good vs Bad Metric Values Look Like

Good values:

Accuracy above 85% on transformed images.
Precision and recall both above 80%, showing balanced detection.
F1 score close to precision and recall, indicating no big tradeoff.

Bad values:

Accuracy below 70%, meaning transforms confused the model.
Precision very high but recall very low, or vice versa, showing imbalance.
F1 score much lower than precision or recall, indicating poor overall performance.

Common Pitfalls with Metrics and Geometric Transforms

Ignoring data leakage: Using the same transformed images in training and testing can falsely boost accuracy.
Overfitting: Model learns to recognize only specific rotations or flips seen in training, not generalizing well.
Accuracy paradox: High accuracy might hide poor recall if classes are imbalanced.
Not validating on original images: Model might perform well on transformed images but poorly on real-world data.

Self Check

Your model trained with rotated and flipped images has 98% accuracy but only 12% recall on the cat class. Is it good for production?

Answer: No, it is not good. The very low recall means the model misses most cats, which is a big problem if you want to detect cats reliably. High accuracy can be misleading if most images are not cats. You should improve recall by adjusting training or using more diverse transforms.

Key Result

For geometric transforms, balanced precision and recall on transformed images show the model learned robust features.