In NLP (Natural Language Processing), we often focus on overall accuracy or error rate because the goal is to correctly process text data.
For NLU (Natural Language Understanding), metrics like precision, recall, and F1 score matter most. This is because understanding means correctly identifying the meaning or intent, so we want to balance finding all correct meanings (recall) and avoiding wrong ones (precision).
In NLG (Natural Language Generation), quality is more subjective. We use metrics like BLEU or ROUGE scores that compare generated text to human-written text. These measure how well the model's output matches expected language patterns.