For summarization, the main goal is to create a short text that keeps the important ideas from a long document. We use ROUGE scores to check this. ROUGE compares the summary with a good human-made summary by counting overlapping words or phrases. ROUGE-1 looks at single words, ROUGE-2 looks at pairs of words, and ROUGE-L looks at longest matching sequences. These scores tell us how well the model keeps the meaning and important details.
Besides ROUGE, precision and recall help us understand if the summary is too short (missing info) or too long (extra info). Precision means how much of the summary is relevant, recall means how much important info from the original is included.