TensorFlowml~8 mins

Pre-trained models (VGG, ResNet, MobileNet) in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Pre-trained models (VGG, ResNet, MobileNet)

Which metric matters for Pre-trained models and WHY

When using pre-trained models like VGG, ResNet, or MobileNet, the key metrics to watch are accuracy and loss during training and validation. Accuracy tells us how often the model predicts correctly. Loss shows how far off the model's predictions are from the true answers. Since these models are often used for image classification, accuracy is a simple and clear way to see if the model is learning well.

Additionally, if the task is imbalanced (some classes appear more than others), metrics like precision, recall, and F1 score become important to understand how well the model handles each class.

Confusion matrix example

Suppose we have a 3-class image classifier using a pre-trained model. Here is a confusion matrix showing predictions:

      | Predicted Class A | Predicted Class B | Predicted Class C |
      |-------------------|-------------------|-------------------|
      | True Class A: 50  | 5                 | 0                 |
      | True Class B: 3   | 45                | 2                 |
      | True Class C: 1   | 4                 | 40                |

This means out of 55 true Class A images, 50 were correctly predicted (True Positives for A), 5 were wrongly predicted as B (False Negatives for A, False Positives for B). Similar logic applies for other classes.

Precision vs Recall tradeoff with examples

Using pre-trained models, sometimes you want to tune for precision or recall depending on the task:

High Precision: If you use MobileNet for a face recognition app that unlocks a phone, you want high precision. This means when the model says "this is you," it should be very sure to avoid unlocking for the wrong person.
High Recall: If you use ResNet to detect rare diseases in medical images, high recall is important. You want to catch as many disease cases as possible, even if some false alarms happen.

Pre-trained models can be fine-tuned to balance precision and recall based on these needs.

What "good" vs "bad" metric values look like

For pre-trained models on common image tasks:

Good: Accuracy above 85% on validation data, low loss, balanced precision and recall above 80% for all classes.
Bad: Accuracy below 60%, high loss, or very low precision/recall for some classes (e.g., 30%). This means the model is not learning well or is biased.

Remember, a high accuracy alone can be misleading if the data is imbalanced.

Common pitfalls with metrics for pre-trained models

Accuracy paradox: High accuracy can happen if one class dominates the data. The model might ignore smaller classes.
Data leakage: If test images are too similar to training images, metrics look better but model won't generalize.
Overfitting: Training accuracy is high but validation accuracy is low. The model memorizes training data but fails on new data.
Ignoring class imbalance: Not checking precision and recall per class can hide poor performance on rare classes.

Self-check question

Your pre-trained model fine-tuned on a dataset has 98% accuracy but only 12% recall on the rare class. Is it good for production? Why or why not?

Answer: No, it is not good. The model misses most examples of the rare class (low recall), which could be critical if that class is important (like a disease). High accuracy is misleading because the rare class is small compared to others.

Key Result

Accuracy is key for overall performance, but precision and recall reveal class-specific strengths and weaknesses in pre-trained models.