0
0
NLPml~20 mins

Why summarization condenses information in NLP - Experiment to Prove It

Choose your learning style9 modes available
Experiment - Why summarization condenses information
Problem:We want to create a model that summarizes long text into shorter versions while keeping the main ideas.
Current Metrics:Training loss: 0.15, Validation loss: 0.40, Training ROUGE-1: 85%, Validation ROUGE-1: 60%
Issue:The model overfits: it performs very well on training data but poorly on validation data, meaning it does not generalize well to new texts.
Your Task
Reduce overfitting so that validation ROUGE-1 score improves to at least 75% while keeping training ROUGE-1 below 85%.
You can only change model architecture and training hyperparameters.
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Solution
NLP
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping

# Sample data placeholders (replace with actual data loading)
X_train, y_train = ...  # tokenized input and target sequences
X_val, y_val = ...

# Model architecture with dropout to reduce overfitting
input_seq = Input(shape=(None,))
embedding = tf.keras.layers.Embedding(input_dim=5000, output_dim=128)(input_seq)
lstm1 = LSTM(256, return_sequences=True)(embedding)
drop1 = Dropout(0.3)(lstm1)
lstm2 = LSTM(256, return_sequences=True)(drop1)
drop2 = Dropout(0.3)(lstm2)
output = Dense(5000, activation='softmax')(drop2)

model = Model(inputs=input_seq, outputs=output)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Early stopping to avoid overfitting
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train model
model.fit(X_train, y_train, epochs=20, batch_size=64, validation_data=(X_val, y_val), callbacks=[early_stop])
Added dropout layers after LSTM layers to reduce overfitting.
Reduced learning rate from 0.001 to 0.0005 for smoother training.
Added early stopping to stop training when validation loss stops improving.
Results Interpretation

Before: Training ROUGE-1: 85%, Validation ROUGE-1: 60% (high overfitting)

After: Training ROUGE-1: 83%, Validation ROUGE-1: 77% (better generalization)

Adding dropout and lowering learning rate helps the model generalize better by reducing overfitting, which improves the quality of summaries on new texts.
Bonus Experiment
Try using a pre-trained transformer model like BERT or T5 for summarization and compare results.
💡 Hint
Pre-trained models have learned language patterns from large data and often summarize better with less training.