TensorFlowml~8 mins

CNN architecture for image classification in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - CNN architecture for image classification

Which metric matters for CNN image classification and WHY

For image classification using CNNs, accuracy is often the first metric to check. It tells us how many images were correctly labeled out of all images.

However, if classes are uneven (some classes have many images, others few), accuracy can be misleading. Then, precision, recall, and F1 score per class become important to understand how well the model detects each category.

For example, if the model misses many images of a rare class, recall for that class will be low, even if overall accuracy is high.

Confusion matrix example

Suppose we have 3 classes: Cat, Dog, and Bird. Here is a confusion matrix from a CNN model:

      | Predicted Cat | Predicted Dog | Predicted Bird |
      |---------------|---------------|----------------|
      | Cat: 50       |  2            | 3              |
      | Dog: 4        | 45            | 1              |
      | Bird: 2       | 3             | 40             |

From this, we can calculate metrics like precision and recall for each class.

Precision vs Recall tradeoff with examples

In image classification, sometimes we want to be very sure when the model says an image is a certain class (high precision). For example, if the model labels an image as a rare animal, we want to avoid mistakes.

Other times, we want to catch as many images of a class as possible (high recall). For example, in medical image classification, missing a disease image is worse than a false alarm.

Adjusting the model or decision threshold can increase precision but lower recall, or vice versa. Understanding this helps choose the right balance for your task.

What good vs bad metric values look like

Good CNN model for image classification:

Accuracy above 90% on balanced test data
Precision and recall above 85% for all major classes
Confusion matrix shows most predictions on the diagonal (correct class)

Bad CNN model:

Accuracy below 70%
Precision or recall very low (<50%) for some classes
Confusion matrix shows many off-diagonal errors (wrong predictions)

Common pitfalls in CNN metrics

Accuracy paradox: High accuracy but poor performance on rare classes.
Data leakage: Test images accidentally appear in training, inflating metrics.
Overfitting: Very high training accuracy but low test accuracy.
Ignoring class imbalance: Metrics averaged over all classes can hide poor results on small classes.

Self-check question

Your CNN model for image classification has 98% accuracy but only 12% recall on a rare class. Is it good for production?

Answer: No. The model misses most images of the rare class (low recall), which can be critical depending on the use case. High overall accuracy hides this problem. You should improve recall for that class before production.

Key Result

Accuracy is key for overall performance, but precision and recall per class reveal true model quality in image classification.