When deploying models on mobile devices, the key metrics are model size, inference latency, and accuracy.
Model size matters because mobile devices have limited storage. Smaller models fit better and load faster.
Inference latency is how fast the model makes predictions on the device. Faster means better user experience.
Accuracy shows how well the model predicts. We want to keep accuracy high even after making the model smaller or faster.
Balancing these metrics ensures the model works well and quickly on mobile phones.