0
0
Computer Visionml~8 mins

Architecture search concepts in Computer Vision - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Architecture search concepts
Which metric matters for Architecture Search and WHY

When searching for the best model design, we want metrics that show how well the model learns and generalizes.

Common metrics include validation accuracy or validation loss. These tell us if the model works well on new data, not just training data.

For classification tasks, accuracy is simple and clear. For imbalanced data, metrics like F1 score or AUC help us understand performance better.

We focus on validation metrics because architecture search tries many designs. We want to pick the one that performs best on data it hasn't seen before.

Confusion Matrix Example

Suppose we test a model found by architecture search on 100 images of cats and dogs.

      | Predicted Cat | Predicted Dog |
      |---------------|---------------|
      | True Cat: 40  | False Dog: 5  |
      | False Cat: 3  | True Dog: 52  |
    

Here:

  • TP (Cat) = 40
  • FP (Cat) = 5
  • FN (Cat) = 3
  • TN (Cat) = 52

Precision = 40 / (40 + 5) = 0.89

Recall = 40 / (40 + 3) = 0.93

F1 = 2 * (0.89 * 0.93) / (0.89 + 0.93) ≈ 0.91

Precision vs Recall Tradeoff in Architecture Search

Architecture search can find models that favor precision or recall depending on the goal.

For example, in a medical image task, high recall is critical to catch all diseases, even if some false alarms happen.

In contrast, for a photo tagging app, high precision is better to avoid wrong tags.

Architecture search helps find models balancing these metrics by testing many designs and picking the best tradeoff.

Good vs Bad Metric Values for Architecture Search

Good: Validation accuracy above 85% with balanced precision and recall (both above 80%) means the architecture generalizes well.

Bad: Very high training accuracy (e.g., 99%) but low validation accuracy (e.g., 60%) means overfitting. The architecture is too complex or not suitable.

Also, very low precision or recall (below 50%) indicates the model misses important cases or makes many mistakes.

Common Pitfalls in Metrics for Architecture Search
  • Accuracy Paradox: High accuracy can be misleading if data is imbalanced. For example, 95% accuracy on 95% negative data means the model ignores positives.
  • Data Leakage: If validation data leaks into training, metrics look unrealistically good, hiding true performance.
  • Overfitting: Architecture search may pick overly complex models that fit training data perfectly but fail on new data.
  • Ignoring Validation Metrics: Only looking at training loss or accuracy can mislead. Validation metrics are key to choosing the best architecture.
Self Check: Is a Model with 98% Accuracy but 12% Recall Good?

No, this model is not good for tasks like fraud detection or disease diagnosis.

Even though accuracy is high, recall is very low. This means the model misses most positive cases (fraud or disease).

In such cases, recall is more important because missing positives can be costly or dangerous.

Architecture search should focus on improving recall, even if accuracy drops a bit.

Key Result
Validation accuracy and balanced precision-recall are key to selecting good architectures that generalize well.