Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What does BERT stand for in NLP?
BERT stands for Bidirectional Encoder Representations from Transformers. It is a model designed to understand the context of words in both directions in a sentence.
Click to reveal answer
beginner
What are the two main tasks used in BERT pre-training?
The two main tasks are: 1. Masked Language Modeling (MLM): Randomly hides some words and trains the model to predict them. 2. Next Sentence Prediction (NSP): Trains the model to understand if one sentence logically follows another.
Click to reveal answer
intermediate
Why is BERT called 'bidirectional'?
Because BERT looks at the words before and after a target word at the same time during training. This helps it understand the full context, unlike older models that read text only left-to-right or right-to-left.
Click to reveal answer
beginner
Explain Masked Language Modeling (MLM) in simple terms.
MLM is like a fill-in-the-blank game. Some words in a sentence are hidden, and BERT tries to guess those missing words using the surrounding words. This helps BERT learn word meanings and context.
Click to reveal answer
intermediate
What is the purpose of Next Sentence Prediction (NSP) in BERT pre-training?
NSP teaches BERT to understand relationships between sentences. It learns to predict if one sentence naturally follows another, which helps in tasks like question answering and text summarization.
Click to reveal answer
What does BERT use to understand the context of words?
ABidirectional reading of sentences
BOnly left-to-right reading
COnly right-to-left reading
DRandom word order
✗ Incorrect
BERT reads sentences in both directions to understand full context.
In Masked Language Modeling, what does BERT try to predict?
AThe topic of the text
BThe next sentence
CHidden words in a sentence
DThe length of the sentence
✗ Incorrect
MLM hides some words and BERT predicts those missing words.
What is the goal of Next Sentence Prediction in BERT?
APredict the next word in a sentence
BPredict if one sentence follows another
CPredict the sentiment of a sentence
DPredict the length of a paragraph
✗ Incorrect
NSP helps BERT learn if one sentence logically follows another.
Why is BERT pre-trained before fine-tuning on specific tasks?
ATo avoid training
BTo memorize answers
CTo reduce model size
DTo learn general language understanding
✗ Incorrect
Pre-training helps BERT learn language patterns useful for many tasks.
Which architecture does BERT use?
ATransformer Encoder
BConvolutional Neural Network
CSupport Vector Machine
DRecurrent Neural Network
✗ Incorrect
BERT is based on the Transformer encoder architecture.
Describe the two main pre-training tasks of BERT and why they are important.
Think about how BERT learns words and sentence order.
You got /4 concepts.
Explain why BERT's bidirectional approach helps it understand language better than previous models.
Consider how knowing words before and after helps guess meaning.
You got /3 concepts.
Practice
(1/5)
1. What are the two main tasks used during BERT pre-training?
easy
A. Text Classification and Named Entity Recognition
B. Masked Language Model and Next Sentence Prediction
C. Part-of-Speech Tagging and Dependency Parsing
D. Sentiment Analysis and Machine Translation
Solution
Step 1: Understand BERT pre-training tasks
BERT is trained to predict missing words and the order of sentences, which correspond to Masked Language Model (MLM) and Next Sentence Prediction (NSP).
Step 2: Match tasks to options
Only Masked Language Model and Next Sentence Prediction lists MLM and NSP, the two key pre-training tasks of BERT.
Final Answer:
Masked Language Model and Next Sentence Prediction -> Option B
Quick Check:
BERT pre-training tasks = MLM + NSP [OK]
Hint: Remember BERT guesses missing words and sentence order [OK]
Common Mistakes:
Confusing fine-tuning tasks with pre-training tasks
Mixing up NLP tasks unrelated to BERT pre-training
Thinking BERT uses only one pre-training task
2. Which of the following is the correct way to describe the Masked Language Model (MLM) task in BERT pre-training?
easy
A. Predict randomly masked words in a sentence
B. Predict the next sentence given the current sentence
C. Classify the sentiment of a sentence
D. Translate a sentence to another language
Solution
Step 1: Define Masked Language Model (MLM)
MLM involves randomly masking some words in a sentence and training the model to predict those masked words.
Step 2: Match definition to options
Predict randomly masked words in a sentence correctly describes MLM as predicting masked words, while others describe different tasks.
Final Answer:
Predict randomly masked words in a sentence -> Option A
Quick Check:
MLM = predict masked words [OK]
Hint: MLM means guessing hidden words in sentences [OK]
Common Mistakes:
Confusing MLM with Next Sentence Prediction
Thinking MLM predicts entire sentences
Mixing MLM with classification tasks
3. Consider the following simplified code snippet for BERT pre-training MLM task:
If the model works correctly, what should predicted_word be?
medium
A. 'cat'
B. 'mat'
C. 'dog'
D. 'sat'
Solution
Step 1: Identify the masked word in the sentence
The original sentence is ['The', 'cat', 'sat', 'on', 'the', 'mat'], and the masked sentence replaces 'cat' with '[MASK]'.
Step 2: Predict the masked word
The model should predict the missing word 'cat' to correctly fill the mask.
Final Answer:
'cat' -> Option A
Quick Check:
Masked word prediction = 'cat' [OK]
Hint: Masked word is replaced by [MASK], predict original word [OK]
Common Mistakes:
Choosing a word from the sentence but not the masked one
Confusing masked word with next sentence prediction
Assuming model predicts random words
4. In BERT pre-training, a common error is mixing up the Next Sentence Prediction (NSP) task. Which of the following statements is a mistake in NSP implementation?
medium
A. Feeding two sentences and predicting if the second follows the first
B. Randomly pairing sentences for negative examples
C. Using a binary classifier to decide sentence order
D. Predicting masked words inside a single sentence
Solution
Step 1: Understand NSP task
NSP involves feeding two sentences and predicting if the second sentence logically follows the first.
Step 2: Identify incorrect statement
Predicting masked words inside a single sentence describes predicting masked words, which is MLM, not NSP, so it is a mistake in NSP implementation.
Final Answer:
Predicting masked words inside a single sentence -> Option D
Quick Check:
NSP ≠ masked word prediction [OK]
Hint: NSP predicts sentence order, not masked words [OK]
Common Mistakes:
Confusing NSP with MLM
Not using sentence pairs for NSP
Skipping negative examples in NSP
5. You want to improve BERT's understanding of sentence relationships by modifying the Next Sentence Prediction (NSP) task. Which approach would best enhance NSP during pre-training?
hard
A. Increase the percentage of masked words in MLM to 50%
B. Replace NSP with a sentiment classification task
C. Add more negative sentence pairs that are unrelated
D. Train only on single sentences without pairs
Solution
Step 1: Understand NSP goal
NSP aims to teach the model to distinguish if one sentence follows another logically by using positive and negative sentence pairs.
Step 2: Choose best enhancement
Adding more negative sentence pairs (unrelated sentences) improves the model's ability to learn sentence relationships, enhancing NSP.
Final Answer:
Add more negative sentence pairs that are unrelated -> Option C
Quick Check:
More negative pairs = better NSP learning [OK]
Hint: More unrelated sentence pairs improve NSP task [OK]