TensorFlowml~8 mins

Loading and inference in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Loading and inference

Which metric matters for Loading and inference and WHY

When we load a saved model and use it to make predictions (inference), the key metric is accuracy or the metric used during training. This tells us if the model still works well after loading.

We also care about inference speed because in real life, predictions should be fast enough to be useful.

So, the main metrics are prediction correctness (like accuracy, precision, recall) and inference time.

Confusion matrix example after loading a model

    Confusion Matrix:

          Predicted
          Pos   Neg
    Actual
    Pos   90    10
    Neg   15    85

    Total samples = 90 + 10 + 15 + 85 = 200

    Precision = TP / (TP + FP) = 90 / (90 + 15) = 0.857
    Recall = TP / (TP + FN) = 90 / (90 + 10) = 0.9
    F1 Score = 2 * (0.857 * 0.9) / (0.857 + 0.9) ≈ 0.878
    Accuracy = (TP + TN) / Total = (90 + 85) / 200 = 0.875

This shows the model loaded correctly and performs well on test data.

Precision vs Recall tradeoff in inference

When using a loaded model, sometimes you want to adjust the decision threshold to balance precision and recall.

For example, in a spam filter:

High precision means fewer good emails marked as spam (false alarms).
High recall means catching most spam emails.

Depending on your goal, you might accept slower inference to get better precision or recall.

Good vs Bad metric values after loading a model

Good: Accuracy close to training accuracy, e.g., 85%+ accuracy, balanced precision and recall.

Bad: Accuracy drops significantly after loading, e.g., below 60%, or very low recall or precision.

Also, inference time should be reasonable (e.g., milliseconds per prediction). Very slow inference is bad for real-time use.

Common pitfalls in loading and inference metrics

Data mismatch: Using different data preprocessing during inference than training causes bad metrics.
Model version mismatch: Loading wrong or corrupted model files leads to poor performance.
Ignoring inference speed: A model might be accurate but too slow for practical use.
Overfitting signs: Very high training accuracy but low inference accuracy means overfitting.

Self-check question

Your loaded model has 98% accuracy but only 12% recall on fraud detection. Is it good for production? Why or why not?

Answer: No, it is not good. Although accuracy is high, recall is very low, meaning the model misses most fraud cases. In fraud detection, missing fraud (low recall) is very bad, so this model is not reliable.

Key Result

After loading, check if accuracy and recall remain high and inference speed is fast enough for your use case.