NLPml~20 mins

Entity linking concept in NLP - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Entity linking concept

Problem:You want to build a model that links names or phrases in text to real-world entities, like matching 'Apple' to the company or the fruit depending on context.

Current Metrics:Training accuracy: 95%, Validation accuracy: 70%

Issue:The model is overfitting: it performs very well on training data but poorly on new, unseen text.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.

You can only change model architecture and training hyperparameters.

You cannot add more training data.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

NLP

import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping

# Sample data placeholders (replace with real data loading)
X_train = tf.random.uniform((1000, 10), maxval=1000, dtype=tf.int32)
y_train = tf.random.uniform((1000,), maxval=5, dtype=tf.int32)
X_val = tf.random.uniform((200, 10), maxval=1000, dtype=tf.int32)
y_val = tf.random.uniform((200,), maxval=5, dtype=tf.int32)

vocab_size = 1000
embedding_dim = 64
num_classes = 5

inputs = Input(shape=(10,))
embedding = Embedding(vocab_size, embedding_dim)(inputs)
lstm = LSTM(64)(embedding)
drop = Dropout(0.5)(lstm)  # Added dropout
outputs = Dense(num_classes, activation='softmax')(drop)

model = Model(inputs, outputs)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

history = model.fit(X_train, y_train, epochs=30, batch_size=32,
                    validation_data=(X_val, y_val), callbacks=[early_stop])

Added a Dropout layer with rate 0.5 after the LSTM layer to reduce overfitting.

Implemented EarlyStopping callback to stop training when validation loss stops improving.

Lowered learning rate to 0.001 for smoother convergence.

Results Interpretation

Before: Training accuracy was 95%, validation accuracy was 70%, showing overfitting.

After: Training accuracy dropped to 90%, validation accuracy improved to 87%, indicating better generalization.

Adding dropout and early stopping helps the model avoid memorizing training data and improves its ability to perform well on new, unseen text.

Bonus Experiment

Try using a simpler model architecture with fewer LSTM units or replace LSTM with a GRU layer to see if it further reduces overfitting.

💡 Hint

Simpler models have fewer parameters and are less likely to memorize training data, which can improve validation accuracy.