PyTorchml~8 mins

REST API inference in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - REST API inference

Which metric matters for REST API inference and WHY

When using a REST API to get predictions from a machine learning model, the key metrics to watch are latency and accuracy. Latency tells us how fast the API responds, which is important for user experience. Accuracy tells us if the model's predictions are correct. For classification tasks, precision, recall, and F1-score help us understand prediction quality. For regression, mean squared error or mean absolute error matter. We want a balance: fast responses and good prediction quality.

Confusion matrix example for REST API inference

Imagine a REST API that classifies emails as spam or not spam. Here is a confusion matrix from 100 requests:

      | Predicted Spam | Predicted Not Spam |
      |----------------|--------------------|
      | True Positives (TP) = 40          |
      | False Positives (FP) = 10         |
      | False Negatives (FN) = 5          |
      | True Negatives (TN) = 45          |

Total requests = 40 + 10 + 5 + 45 = 100

From this, we calculate:

Precision = TP / (TP + FP) = 40 / (40 + 10) = 0.8
Recall = TP / (TP + FN) = 40 / (40 + 5) = 0.8889
F1-score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.842
Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85

Precision vs Recall tradeoff with REST API examples

In REST API inference, sometimes we want to avoid false alarms (false positives), and sometimes we want to catch all true cases (false negatives).

High Precision Example: A spam filter API where marking a good email as spam is bad. We want high precision to avoid false positives.
High Recall Example: A medical diagnosis API where missing a disease is dangerous. We want high recall to catch all true cases.

Choosing the right metric depends on the API's purpose and user impact.

What "good" vs "bad" metric values look like for REST API inference

For a REST API serving predictions:

Good: Accuracy above 90%, precision and recall balanced above 85%, latency under 200ms.
Bad: Accuracy below 70%, precision or recall below 50%, latency over 1 second causing slow user experience.

Good metrics mean users get fast and reliable predictions. Bad metrics mean slow or wrong results, hurting trust.

Common pitfalls in REST API inference metrics

Accuracy paradox: High accuracy can be misleading if classes are imbalanced. For example, 95% accuracy if 95% data is one class but model ignores the other.
Data leakage: If test data leaks into training, API predictions look too good but fail in real use.
Overfitting: Model performs well on training but poorly on new API requests.
Ignoring latency: A very accurate model that is too slow makes the API unusable.

Self-check question

Your REST API model has 98% accuracy but only 12% recall on fraud detection. Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means the model misses most fraud cases, which is dangerous. Even with high accuracy, the model fails to catch fraud, so it should be improved before production.

Key Result

For REST API inference, balance prediction quality (precision, recall) with fast response time (latency) to ensure useful and reliable predictions.