0
0
Prompt Engineering / GenAIml~20 mins

Translation in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Translation
Problem:Build a machine translation model that translates English sentences into French.
Current Metrics:Training accuracy: 98%, Validation accuracy: 65%, Training loss: 0.05, Validation loss: 1.2
Issue:The model is overfitting: training accuracy is very high but validation accuracy is low, indicating poor generalization.
Your Task
Reduce overfitting so that validation accuracy improves to at least 80% while keeping training accuracy below 90%.
You can only modify the model architecture and training hyperparameters.
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping

# Sample data loading and preprocessing assumed done
# Define model with dropout and reduced complexity
input_dim = 10000  # vocabulary size
output_dim = 10000
embedding_dim = 256
latent_dim = 128  # reduced from 256

# Encoder
encoder_inputs = Input(shape=(None,))
encoder_embedding = tf.keras.layers.Embedding(input_dim, embedding_dim)(encoder_inputs)
encoder_lstm = LSTM(latent_dim, return_state=True, dropout=0.3)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)

# Decoder
decoder_inputs = Input(shape=(None,))
decoder_embedding = tf.keras.layers.Embedding(output_dim, embedding_dim)(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True, dropout=0.3)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=[state_h, state_c])
decoder_dense = Dense(output_dim, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Assume X_train_encoder, X_train_decoder, y_train, X_val_encoder, X_val_decoder, y_val are prepared
# model.fit([X_train_encoder, X_train_decoder], y_train, epochs=30, batch_size=64, validation_data=([X_val_encoder, X_val_decoder], y_val), callbacks=[early_stop])
Reduced LSTM units from 256 to 128 to lower model complexity.
Added dropout of 0.3 in LSTM layers to reduce overfitting.
Added early stopping to stop training when validation loss stops improving.
Lowered learning rate to 0.001 for smoother training.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 65%, Training loss 0.05, Validation loss 1.2

After: Training accuracy 88%, Validation accuracy 82%, Training loss 0.25, Validation loss 0.45

Adding dropout and early stopping, reducing model size, and lowering learning rate help reduce overfitting. This improves validation accuracy and makes the model generalize better to new data.
Bonus Experiment
Try using a transformer-based model for translation instead of LSTM to see if it improves accuracy further.
💡 Hint
Use TensorFlow's Transformer or Hugging Face's pretrained translation models and fine-tune on your dataset.