When optimizing models by distillation or quantization, the key metrics to watch are accuracy or task-specific performance (like F1 score for NLP tasks). This is because these methods reduce model size or speed up inference but can cause small drops in prediction quality. We want to keep the model as accurate as possible while making it smaller or faster.
Latency and model size are also important metrics here, but they are not about prediction quality. They measure how fast and small the model is after optimization.