When packaging a model into a .mar file, the main goal is to keep the model's performance intact after deployment. Therefore, the key metrics to check are the model's prediction accuracy, loss, and inference speed before and after packaging. This ensures the model behaves the same and runs efficiently in production.
Model packaging (.mar files) in PyTorch - Model Metrics & Evaluation
For classification models packaged in .mar files, the confusion matrix before and after packaging should be identical. For example:
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP) | False Negative (FN) |
| False Positive (FP) | True Negative (TN) |
TP + FP + FN + TN = total samples
If the confusion matrix changes after packaging, it means the model predictions changed, indicating a packaging issue.
Packaging should not affect the tradeoff between precision and recall. For example, if a spam filter model packaged as a .mar file had 90% precision and 85% recall before packaging, it should keep similar values after packaging.
If precision drops, the model may mark good emails as spam (false positives). If recall drops, it may miss spam emails (false negatives). Packaging must preserve this balance.
Good: Metrics before and after packaging are nearly identical (e.g., accuracy difference < 1%). Inference speed is stable or improved. No errors during loading or prediction.
Bad: Large drops in accuracy, precision, recall, or F1 score after packaging. Increased latency or failures when loading the .mar file. This means the packaging corrupted the model or environment.
- Accuracy Paradox: High accuracy but poor recall or precision after packaging can hide problems.
- Data Leakage: Testing metrics on training data can falsely show no change after packaging.
- Overfitting Indicators: If metrics improve unrealistically after packaging, it may be a sign of testing on the wrong data.
- Environment Differences: Differences in hardware or software versions can cause metric changes unrelated to packaging.
Your model packaged as a .mar file shows 98% accuracy but only 12% recall on fraud detection. Is it good for production? Why or why not?
Answer: No, it is not good. Although accuracy is high, the recall is very low, meaning the model misses most fraud cases. In fraud detection, recall is critical to catch as many frauds as possible. Packaging should preserve or improve recall, so this indicates a problem.