Experiment - Why summarization condenses information
Problem:We want to create a model that summarizes long text into shorter versions while keeping the main ideas.
Current Metrics:Training loss: 0.15, Validation loss: 0.40, Training ROUGE-1: 85%, Validation ROUGE-1: 60%
Issue:The model overfits: it performs very well on training data but poorly on validation data, meaning it does not generalize well to new texts.