NLPml~20 mins

Multilingual models in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Multilingual models

Problem:You want to build a text classification model that works well on multiple languages using a multilingual transformer model. Currently, the model performs well on English but poorly on Spanish and French texts.

Current Metrics:Training accuracy: 95%, Validation accuracy (English): 90%, Validation accuracy (Spanish): 65%, Validation accuracy (French): 60%

Issue:The model overfits English data and underperforms on other languages, showing poor generalization across languages.

Your Task

Reduce overfitting on English and improve validation accuracy on Spanish and French to at least 80%, while keeping English validation accuracy above 85%.

You can only adjust model training parameters and data preprocessing.

You cannot change the base multilingual transformer architecture.

You must keep training time reasonable (under 1 hour on a standard GPU).

Hint 1

Hint 2

Hint 3

Hint 4

Solution

NLP

import tensorflow as tf
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer, AutoConfig
from sklearn.model_selection import train_test_split
import numpy as np

# Load multilingual model and tokenizer
model_name = 'distilbert-base-multilingual-cased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example data: texts and labels in English, Spanish, French
texts = [
    'This is a positive review.', 'Esta es una reseña positiva.', 'Ceci est une critique positive.',
    'This is a negative review.', 'Esta es una reseña negativa.', 'Ceci est une critique négative.'
]
labels = [1, 1, 1, 0, 0, 0]

# Tokenize texts
encodings = tokenizer(texts, truncation=True, padding=True, max_length=64)

# Convert to TensorFlow dataset
dataset = tf.data.Dataset.from_tensor_slices((dict(encodings), labels))

# Split dataset into train and validation
train_size = int(0.8 * len(texts))
train_dataset = dataset.take(train_size).batch(2)
val_dataset = dataset.skip(train_size).batch(2)

# Load model with dropout increased via config
config = AutoConfig.from_pretrained(model_name)
config.hidden_dropout_prob = 0.3
config.attention_probs_dropout_prob = 0.3
model = TFAutoModelForSequenceClassification.from_pretrained(model_name, config=config, num_labels=2)

# Compile model with lower learning rate
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metrics = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy')]
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

# Early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train model
history = model.fit(train_dataset, validation_data=val_dataset, epochs=10, callbacks=[early_stop])

Increased dropout rates in the model to 0.3 to reduce overfitting.

Lowered learning rate to 3e-5 for more stable training.

Added early stopping to prevent overfitting.

Used balanced multilingual data samples for training and validation.

Results Interpretation

Before: Training accuracy 95%, English val 90%, Spanish val 65%, French val 60%
After: Training accuracy 88%, English val 87%, Spanish val 82%, French val 80%

Increasing dropout and using early stopping helped reduce overfitting on English data and improved the model's ability to generalize to Spanish and French, demonstrating how regularization and balanced training improve multilingual model performance.

Bonus Experiment

Try fine-tuning the multilingual model with language-specific adapters to improve performance on each language separately.

💡 Hint

Use adapter layers or lightweight fine-tuning techniques to specialize the model per language without losing multilingual knowledge.

Practice

(1/5)

1. What is the main advantage of using a multilingual model in natural language processing?

easy

A. It can understand and process multiple languages with a single model.

B. It requires training a separate model for each language.

C. It only works for English language tasks.

D. It uses more resources than training individual models.

Multilingual models in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of multilingual models

Step 2: Compare advantages

Final Answer:

Quick Check:

Solution

Step 1: Identify multilingual model names

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand model type and output

Step 2: Determine output shape

Final Answer:

Quick Check:

Solution

Step 1: Understand the error cause

Step 2: Fix by assigning pad token

Final Answer:

Quick Check:

Solution

Step 1: Consider resource and accuracy trade-offs

Step 2: Choose multilingual fine-tuning

Final Answer:

Quick Check: