NLPml~20 mins

GRU for text in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - GRU for text

Problem:We want to classify movie reviews as positive or negative using a GRU (Gated Recurrent Unit) model on text data.

Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 0.60

Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.

Keep using the GRU architecture.

Do not change the dataset or preprocessing steps.

You can adjust hyperparameters and add regularization techniques.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

NLP

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense, Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb

# Load data
max_features = 10000
maxlen = 100
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences
X_train = pad_sequences(X_train, maxlen=maxlen)
X_test = pad_sequences(X_test, maxlen=maxlen)

# Build model with dropout and fewer units
model = Sequential([
    Embedding(max_features, 64, input_length=maxlen),
    GRU(32, dropout=0.3, recurrent_dropout=0.3),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=['accuracy']
)

# Early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train model
history = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=64,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=2
)

# Evaluate on test data
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)

print(f'Test accuracy: {test_acc:.2f}', f'Test loss: {test_loss:.2f}')

Reduced GRU units from 64 to 32 to simplify the model.

Added dropout inside GRU and a separate dropout layer to reduce overfitting.

Added early stopping to stop training when validation loss stops improving.

Lowered learning rate to 0.001 for smoother training.

Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70%, Validation loss 0.60

After: Training accuracy 90%, Validation accuracy 86%, Validation loss 0.40

Adding dropout and early stopping helps reduce overfitting in GRU models for text. This leads to better validation accuracy and more reliable predictions on new data.

Bonus Experiment

Try replacing the GRU layer with an LSTM layer and compare the validation accuracy and training time.

💡 Hint

LSTM can capture longer dependencies but may train slower. Adjust dropout and units similarly.

Practice

(1/5)

1. What is the main advantage of using a GRU (Gated Recurrent Unit) in text processing tasks?

easy

A. It helps the model remember important information over time while ignoring less important details.

B. It increases the size of the input text automatically.

C. It converts text into images for better analysis.

D. It removes all punctuation from the text before processing.

GRU for text in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand GRU's role in memory

Step 2: Compare options to GRU function

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch GRU parameters

Step 2: Match parameters to given sizes

Final Answer:

Quick Check:

Solution

Step 1: Understand GRU output shape with batch_first=true

Step 2: Match given input sizes

Final Answer:

Quick Check:

Solution

Step 1: Check GRU input_size vs input tensor last dimension

Step 2: Understand tensor shape requirements

Final Answer:

Quick Check:

Solution

Step 1: Understand variable-length sequence handling

Step 2: Use padding and packing for variable-length inputs

Final Answer:

Quick Check: