NLPml~20 mins

Multilingual models in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Multilingual Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why use multilingual models?

Which of the following is the main advantage of using a multilingual model over separate monolingual models?

AIt eliminates the need for any language-specific preprocessing steps.

BIt always achieves higher accuracy on every language than monolingual models.

CIt requires less total training data because it shares knowledge across languages.

DIt guarantees zero bias in all languages it supports.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of tokenization in a multilingual model

Given the following code snippet using a multilingual tokenizer, what is the output tokens list?

NLP

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-multilingual-cased')
text = 'Hello, 你好, مرحبا'
tokens = tokenizer.tokenize(text)
print(tokens)

A['Hello', ',', '你', '好', ',', 'م', '##رح', '##با']

B['Hello', ',', '你好', ',', 'مرحبا']

C['Hello', ',', '你', '##好', ',', 'م', '##رح', '##با']

D['Hello', ',', '你', '好', ',', 'مرحبا']

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Choosing a model for low-resource languages

You want to build a text classification model for a low-resource language with very little labeled data. Which model choice is best?

ATrain a monolingual model from scratch only on the low-resource language data.

BUse a large multilingual pretrained model fine-tuned on the low-resource language data.

CUse a monolingual pretrained model from a high-resource language and translate data.

DTrain a multilingual model from scratch on all available languages.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Evaluating multilingual model performance

You evaluate a multilingual model on three languages and get these F1 scores: English 0.85, Spanish 0.80, Swahili 0.60. Which metric best summarizes overall performance fairly?

AWeighted-average F1 score (weighted by number of samples per language).

BMicro-average F1 score (global average considering all samples).

CMacro-average F1 score (average of all languages' F1 scores).

DMaximum F1 score among the three languages.

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Debugging poor performance on a multilingual model

You fine-tuned a multilingual model on a dataset with English and French texts. The model performs well on English but poorly on French. Which is the most likely cause?

AThe optimizer used is incompatible with multilingual models.

BThe model architecture does not support French language tokens.

CThe training data for French was accidentally excluded during fine-tuning.

DThe tokenizer was not properly set to the multilingual tokenizer, causing French text to be tokenized incorrectly.

Attempts:

2 left

Practice

(1/5)

1. What is the main advantage of using a multilingual model in natural language processing?

easy

A. It can understand and process multiple languages with a single model.

B. It requires training a separate model for each language.

C. It only works for English language tasks.

D. It uses more resources than training individual models.

Multilingual models in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of multilingual models

Step 2: Compare advantages

Final Answer:

Quick Check:

Solution

Step 1: Identify multilingual model names

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand model type and output

Step 2: Determine output shape

Final Answer:

Quick Check:

Solution

Step 1: Understand the error cause

Step 2: Fix by assigning pad token

Final Answer:

Quick Check:

Solution

Step 1: Consider resource and accuracy trade-offs

Step 2: Choose multilingual fine-tuning

Final Answer:

Quick Check: