0
0
NLPml~20 mins

BERT pre-training concept in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
BERT Pre-training Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main goal of BERT's Masked Language Model (MLM) during pre-training?

BERT uses a special pre-training task called Masked Language Model (MLM). What is the main goal of MLM?

APredict randomly masked words in a sentence using context from both sides
BClassify the sentiment of a sentence as positive or negative
CPredict the next word in a sentence given all previous words
DTranslate a sentence from one language to another
Attempts:
2 left
💡 Hint

Think about how BERT learns from words hidden in the middle of sentences.

🧠 Conceptual
intermediate
2:00remaining
What is the purpose of the Next Sentence Prediction (NSP) task in BERT pre-training?

Besides MLM, BERT uses Next Sentence Prediction (NSP) during pre-training. What does NSP help BERT learn?

ATo predict the sentiment of a sentence
BTo determine if one sentence logically follows another
CTo translate sentences between languages
DTo generate new sentences from scratch
Attempts:
2 left
💡 Hint

Think about how BERT understands relationships between two sentences.

Model Choice
advanced
2:30remaining
Which architecture component enables BERT to use context from both left and right sides during MLM pre-training?

BERT can look at words before and after a masked word simultaneously. Which part of BERT's architecture allows this?

AUnidirectional LSTM layers
BRecurrent neural networks with attention
CConvolutional neural networks
DBidirectional Transformer encoder layers
Attempts:
2 left
💡 Hint

Think about which architecture processes all words at once with attention.

Metrics
advanced
2:30remaining
During BERT pre-training, which metric best indicates how well the model predicts masked tokens?

Which metric is commonly used to measure BERT's performance on the Masked Language Model task during pre-training?

AMean squared error of token embeddings
BBLEU score for sentence generation
CAccuracy of predicting masked tokens
DF1 score for next sentence prediction
Attempts:
2 left
💡 Hint

Focus on how well the model guesses the hidden words correctly.

🔧 Debug
expert
3:00remaining
What error will occur if BERT's input tokens are not properly masked during MLM pre-training?

Suppose you accidentally feed BERT input sequences without masking any tokens during MLM pre-training. What is the most likely outcome?

AThe loss will be very low but the model won't learn to predict masked words
BThe model will train normally with no issues
CA runtime error will occur due to missing mask tokens
DThe model will overfit quickly and produce random predictions
Attempts:
2 left
💡 Hint

Think about what happens if the model never has to guess missing words.