0
0
Prompt Engineering / GenAIml~20 mins

Embedding dimensionality considerations in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Embedding dimensionality considerations
Problem:You are training a text classification model using word embeddings. Currently, the embedding dimension is set to 300. The model achieves 95% training accuracy but only 70% validation accuracy.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 0.65
Issue:The model is overfitting due to high embedding dimensionality causing too many parameters and poor generalization.
Your Task
Reduce overfitting by adjusting the embedding dimensionality to improve validation accuracy to at least 80% while keeping training accuracy below 90%.
You can only change the embedding dimension and related model parameters.
Do not change the dataset or model architecture except embedding size.
Keep training epochs and batch size the same.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GlobalAveragePooling1D, Dense

# Sample data placeholders
vocab_size = 10000
max_length = 100

# Reduced embedding dimension from 300 to 100
embedding_dim = 100

model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length),
    GlobalAveragePooling1D(),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Assume X_train, y_train, X_val, y_val are preloaded datasets
# For demonstration, using dummy data
import numpy as np
X_train = np.random.randint(0, vocab_size, size=(1000, max_length))
y_train = np.random.randint(0, 2, size=(1000,))
X_val = np.random.randint(0, vocab_size, size=(200, max_length))
y_val = np.random.randint(0, 2, size=(200,))

history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val))
Reduced embedding dimension from 300 to 100 to decrease model complexity.
Kept other model layers and training parameters unchanged.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70%, Training loss 0.15, Validation loss 0.65

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.30, Validation loss 0.45

Reducing embedding dimensionality lowers model complexity, which helps reduce overfitting and improves validation accuracy by enabling better generalization.
Bonus Experiment
Try increasing the embedding dimension beyond 300 and observe the effect on overfitting and validation accuracy.
💡 Hint
Increasing embedding size may increase overfitting and reduce validation accuracy if the model becomes too complex.