0
0
TensorFlowml~8 mins

Sequential model API in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - Sequential model API
Which metric matters for Sequential model API and WHY

The Sequential model API is often used for simple neural networks where layers are stacked one after another. The key metrics to evaluate such models depend on the task:

  • For classification: Accuracy, Precision, Recall, and F1 score help understand how well the model predicts classes.
  • For regression: Mean Squared Error (MSE) or Mean Absolute Error (MAE) show how close predictions are to actual values.

Choosing the right metric helps you know if your model is learning well and making useful predictions.

Confusion matrix example for classification

Imagine a Sequential model classifying emails as spam or not spam. Here is a confusion matrix from 100 emails:

      | Predicted Spam | Predicted Not Spam |
      |----------------|--------------------|
      | True Positives (TP) = 40           |
      | False Positives (FP) = 10          |
      | False Negatives (FN) = 5           |
      | True Negatives (TN) = 45           |
    

Totals: TP + FP + FN + TN = 40 + 10 + 5 + 45 = 100 emails

From this, we calculate:

  • Precision = TP / (TP + FP) = 40 / (40 + 10) = 0.8
  • Recall = TP / (TP + FN) = 40 / (40 + 5) = 0.8889
  • Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85
  • F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.8421
Precision vs Recall tradeoff with examples

When using Sequential models for classification, you often balance precision and recall:

  • High Precision: Means fewer false alarms. For example, a spam filter that rarely marks good emails as spam.
  • High Recall: Means catching most positive cases. For example, a disease detector that finds almost all sick patients.

Depending on your goal, you might want to adjust the model or threshold to favor one metric over the other.

What "good" vs "bad" metric values look like for Sequential models

For a Sequential model used in classification:

  • Good: Accuracy above 80%, Precision and Recall above 75%, balanced F1 score.
  • Bad: Accuracy near random guess (e.g., 50% for two classes), very low Precision or Recall (below 50%), or large difference between Precision and Recall.

For regression tasks, good models have low MSE or MAE, meaning predictions are close to actual values.

Common pitfalls when evaluating Sequential models
  • Accuracy paradox: High accuracy can be misleading if classes are imbalanced (e.g., 95% accuracy by always predicting the majority class).
  • Data leakage: When test data leaks into training, metrics look better but model fails in real use.
  • Overfitting: Model performs very well on training data but poorly on new data, showing high training accuracy but low test accuracy.
  • Ignoring metric choice: Using accuracy alone for imbalanced data can hide poor performance on minority classes.
Self-check question

Your Sequential model has 98% accuracy but only 12% recall on fraud detection. Is it good for production? Why or why not?

Answer: No, it is not good. Although accuracy is high, the model misses 88% of fraud cases (low recall). In fraud detection, catching fraud (high recall) is critical to avoid losses. This model would let most fraud go undetected.

Key Result
For Sequential models, balance precision and recall to ensure meaningful predictions beyond just accuracy.