When converting a TensorFlow model to TensorFlow Lite, the key metrics to watch are model size, inference latency, and accuracy drop. Model size matters because smaller models run better on devices with limited storage. Latency is important because users want fast responses on mobile or embedded devices. Accuracy drop shows if the conversion changed the model's predictions negatively. We want to keep accuracy close to the original while improving size and speed.
TensorFlow Lite conversion - Model Metrics & Evaluation
Before and after conversion, compare the model's predictions on the same test data. For example, a confusion matrix for classification tasks can show if the TensorFlow Lite model makes more mistakes.
Original model confusion matrix: TP=90 FP=10 FN=5 TN=95 TensorFlow Lite model confusion matrix: TP=88 FP=12 FN=7 TN=93 Total samples = 90+10+5+95 = 200
This shows a slight drop in true positives and increase in false positives after conversion, indicating a small accuracy loss.
After conversion, precision and recall might change. For example, if the original model had precision 90/(90+10)=0.9 and recall 90/(90+5)=0.947, the TensorFlow Lite model might have precision 88/(88+12)=0.88 and recall 88/(88+7)=0.926.
This small drop means the converted model is slightly less precise and recalls fewer true positives. The tradeoff is acceptable if the model runs faster and smaller on the device.
Good: Model size reduced by 50% or more, latency reduced by 30% or more, and accuracy drop less than 2%. Precision and recall remain above 85%.
Bad: Model size barely changes, latency stays high, or accuracy drops more than 5%, causing poor predictions. Precision or recall below 80% means the model is less reliable after conversion.
- Ignoring accuracy drop: Focusing only on size and speed can hide big accuracy losses.
- Data leakage: Testing on data used during training can give false high accuracy.
- Overfitting indicators: If the converted model performs well on training data but poorly on new data, it may be overfitting.
- Not testing on real device: Metrics measured on desktop may not reflect real device performance.
Your TensorFlow Lite model has 98% accuracy but 12% recall on fraud detection. Is it good for production? Why or why not?
Answer: No, it is not good. Although accuracy is high, recall is very low. This means the model misses most fraud cases, which is dangerous. For fraud detection, high recall is critical to catch as many frauds as possible.