When serving a model with a Flask API, the key metrics to watch are latency and accuracy. Latency tells us how fast the API responds to a request, which affects user experience. Accuracy tells us if the model predictions are correct. Both matter because a slow or wrong prediction hurts users.
0
0
Flask API for model serving in ML Python - Model Metrics & Evaluation
Metrics & Evaluation - Flask API for model serving
Which metric matters for Flask API model serving and WHY
Confusion matrix example for classification model served by Flask API
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP): 50 | False Negative (FN): 10 |
| False Positive (FP): 5 | True Negative (TN): 35 |
Total samples = 50 + 10 + 5 + 35 = 100
From this, we calculate:
- Precision = TP / (TP + FP) = 50 / (50 + 5) = 0.91
- Recall = TP / (TP + FN) = 50 / (50 + 10) = 0.83
- F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.87
Precision vs Recall tradeoff with Flask API example
Imagine your Flask API serves a spam detection model:
- High Precision: The API rarely marks good emails as spam. Users trust it because false alarms are low.
- High Recall: The API catches almost all spam emails, but may mark some good emails as spam.
Depending on user needs, you might prefer one over the other. For example, if missing spam is worse, prioritize recall. If annoying users with false spam is worse, prioritize precision.
Good vs Bad metric values for Flask API model serving
Good metrics:
- Accuracy above 85% for balanced data
- Precision and recall both above 80%
- API latency under 200 milliseconds
Bad metrics:
- Accuracy near random guess (e.g., 50% for binary)
- Precision very low (e.g., 30%) causing many false positives
- Recall very low (e.g., 20%) missing many true cases
- API latency over 1 second causing slow user experience
Common pitfalls in Flask API model serving metrics
- Accuracy paradox: High accuracy but poor recall or precision due to imbalanced data.
- Data leakage: Model trained on data similar to test data inflates metrics but fails in real API use.
- Overfitting: Model performs well on training but poorly on API requests from new data.
- Ignoring latency: A model with great accuracy but slow API response frustrates users.
Self-check question
Your Flask API model has 98% accuracy but only 12% recall on fraud detection. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the model misses most fraud cases, which is dangerous. Even with high accuracy, missing fraud is costly. You should improve recall before production.
Key Result
For Flask API model serving, balance accuracy and latency, and carefully monitor precision and recall to ensure useful, timely predictions.