0
0
NLPml~20 mins

Domain-specific sentiment in NLP - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Domain-specific sentiment
Problem:You want to build a sentiment analysis model that understands the feelings expressed in movie reviews. The current model is trained on general product reviews and does not perform well on movie reviews.
Current Metrics:Training accuracy: 92%, Validation accuracy: 68%, Validation loss: 0.85
Issue:The model is overfitting to general product reviews and does not generalize well to movie reviews, resulting in low validation accuracy.
Your Task
Reduce overfitting and improve validation accuracy on movie reviews to at least 80%, while keeping training accuracy below 90%.
You can only modify the model architecture and training process.
You cannot change the dataset or add more data.
You must keep the training time reasonable (under 10 minutes).
Hint 1
Hint 2
Hint 3
Hint 4
Solution
NLP
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Assume X_train, y_train, X_val, y_val are preprocessed movie review data

model = Sequential([
    Embedding(input_dim=10000, output_dim=128, input_length=100),
    LSTM(64, return_sequences=False),
    Dropout(0.5),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

history = model.fit(X_train, y_train,
                    epochs=20,
                    batch_size=32,
                    validation_data=(X_val, y_val),
                    callbacks=[early_stop])
Added Dropout layers after LSTM and Dense layers to reduce overfitting.
Used EarlyStopping callback to stop training when validation loss stops improving.
Set learning rate to 0.001 for better convergence.
Kept model size moderate to avoid overfitting.
Results Interpretation

Before: Training accuracy: 92%, Validation accuracy: 68%, Validation loss: 0.85

After: Training accuracy: 88%, Validation accuracy: 82%, Validation loss: 0.55

Adding dropout and early stopping helped reduce overfitting, improving validation accuracy on domain-specific movie reviews while keeping training accuracy reasonable.
Bonus Experiment
Try using pre-trained word embeddings like GloVe or Word2Vec fine-tuned on movie reviews to further improve validation accuracy.
💡 Hint
Load pre-trained embeddings and set them as weights in the Embedding layer with trainable=True to adapt to movie review language.