In attention mechanisms, especially in natural language processing, the key metrics depend on the task. For example, in machine translation or text summarization, BLEU or ROUGE scores measure how well the model's output matches human references. For classification tasks using attention, accuracy, precision, and recall matter to understand how well the model focuses on important parts of the input.
Attention itself is not a standalone model but a component that helps models weigh input parts differently. So, metrics that evaluate the final task performance (like translation quality or classification accuracy) are most important.