0
0
TensorFlowml~8 mins

SavedModel format in TensorFlow - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - SavedModel format
Which metric matters for SavedModel format and WHY

The SavedModel format is a way to save and share trained TensorFlow models. It stores the model's architecture, weights, and computation graph so you can reuse it later or deploy it. The key metrics to check when using SavedModel are model accuracy and model loss before saving and after loading. This ensures the model saved is the same as the one loaded, with no loss in performance.

Why? Because the SavedModel format is about preserving the model's ability to make correct predictions. If accuracy or loss changes after saving and loading, it means the model was not saved or restored correctly.

Confusion matrix or equivalent visualization

Imagine you have a classification model saved in SavedModel format. After loading, you test it on 100 samples and get this confusion matrix:

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP): 40 | False Negative (FN): 10 |
      | False Positive (FP): 5 | True Negative (TN): 45 |
    

Total samples = 40 + 10 + 5 + 45 = 100

From this, you can calculate:

  • Precision = TP / (TP + FP) = 40 / (40 + 5) = 0.89
  • Recall = TP / (TP + FN) = 40 / (40 + 10) = 0.80
  • Accuracy = (TP + TN) / Total = (40 + 45) / 100 = 0.85

If these values match before saving and after loading, the SavedModel format worked well.

Precision vs Recall tradeoff with concrete examples

When you save and load a model using SavedModel, you want to keep the balance between precision and recall intact.

For example, if your model is a spam filter:

  • High precision means few good emails are marked as spam (few false positives).
  • High recall means most spam emails are caught (few false negatives).

If after loading the model, precision drops but recall stays high, the model might mark too many good emails as spam. This means the SavedModel format did not preserve the model perfectly.

So, checking precision and recall before and after saving helps ensure the model's behavior stays consistent.

What "good" vs "bad" metric values look like for SavedModel format

Good:

  • Accuracy, precision, recall, and loss values before saving and after loading are nearly the same (differences less than 0.01).
  • Confusion matrix counts match closely.
  • Model predictions on test data are identical or very close.

Bad:

  • Significant drop in accuracy or increase in loss after loading.
  • Precision or recall changes drastically, indicating the model behaves differently.
  • Confusion matrix counts differ greatly, showing prediction errors.
  • Model fails to load or throws errors, meaning the SavedModel format is corrupted or incomplete.
Metrics pitfalls when using SavedModel format
  • Data leakage: If test data leaks into training, metrics look better but model is not truly saved well.
  • Overfitting: High training accuracy but low test accuracy can mislead you about model quality after saving.
  • Version mismatch: Saving with one TensorFlow version and loading with another may cause errors or metric changes.
  • Ignoring metric changes: Not checking metrics after loading can hide problems in the SavedModel.
  • Confusion matrix miscalculation: Not summing TP, FP, TN, FN correctly leads to wrong precision/recall.
Self-check question

Your model saved in SavedModel format has 98% accuracy before saving but only 12% recall on fraud cases after loading. Is it good for production? Why or why not?

Answer: No, it is not good. Although accuracy is high, recall on fraud is very low. This means the model misses most fraud cases after loading, which is dangerous. The SavedModel format did not preserve the model's ability to detect fraud well. You should investigate saving/loading steps or model compatibility.

Key Result
SavedModel format must preserve model accuracy, precision, and recall before and after saving to ensure reliable predictions.