0
0
NLPml~5 mins

BERT pre-training concept in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does BERT stand for in NLP?
BERT stands for Bidirectional Encoder Representations from Transformers. It is a model designed to understand the context of words in both directions in a sentence.
Click to reveal answer
beginner
What are the two main tasks used in BERT pre-training?
The two main tasks are:<br>1. Masked Language Modeling (MLM): Randomly hides some words and trains the model to predict them.<br>2. Next Sentence Prediction (NSP): Trains the model to understand if one sentence logically follows another.
Click to reveal answer
intermediate
Why is BERT called 'bidirectional'?
Because BERT looks at the words before and after a target word at the same time during training. This helps it understand the full context, unlike older models that read text only left-to-right or right-to-left.
Click to reveal answer
beginner
Explain Masked Language Modeling (MLM) in simple terms.
MLM is like a fill-in-the-blank game. Some words in a sentence are hidden, and BERT tries to guess those missing words using the surrounding words. This helps BERT learn word meanings and context.
Click to reveal answer
intermediate
What is the purpose of Next Sentence Prediction (NSP) in BERT pre-training?
NSP teaches BERT to understand relationships between sentences. It learns to predict if one sentence naturally follows another, which helps in tasks like question answering and text summarization.
Click to reveal answer
What does BERT use to understand the context of words?
ABidirectional reading of sentences
BOnly left-to-right reading
COnly right-to-left reading
DRandom word order
In Masked Language Modeling, what does BERT try to predict?
AThe topic of the text
BThe next sentence
CHidden words in a sentence
DThe length of the sentence
What is the goal of Next Sentence Prediction in BERT?
APredict the next word in a sentence
BPredict if one sentence follows another
CPredict the sentiment of a sentence
DPredict the length of a paragraph
Why is BERT pre-trained before fine-tuning on specific tasks?
ATo avoid training
BTo memorize answers
CTo reduce model size
DTo learn general language understanding
Which architecture does BERT use?
ATransformer Encoder
BConvolutional Neural Network
CSupport Vector Machine
DRecurrent Neural Network
Describe the two main pre-training tasks of BERT and why they are important.
Think about how BERT learns words and sentence order.
You got /4 concepts.
    Explain why BERT's bidirectional approach helps it understand language better than previous models.
    Consider how knowing words before and after helps guess meaning.
    You got /3 concepts.