When serving NLP models, the key metrics depend on the task. For classification tasks like sentiment analysis, accuracy, precision, and recall matter because they show how well the model predicts correct labels.
For generation tasks like chatbots, metrics like response time and latency are critical to ensure users get quick answers.
Overall, latency and throughput measure how fast and how many requests the model can handle, which is vital for a smooth user experience.