Recall & Review
beginner
What does BERT stand for and what is its main purpose?
BERT stands for Bidirectional Encoder Representations from Transformers. It is a model designed to understand the context of words in a sentence by looking at both left and right sides, helping machines understand language better.
Click to reveal answer
beginner
Why is BERT called 'bidirectional'?
Because BERT reads the text from both directions (left to right and right to left) at the same time, it understands the full context of a word based on all surrounding words.
Click to reveal answer
intermediate
What is the role of the [CLS] token in BERT for text classification?
The [CLS] token is added at the start of every input sentence. After processing, its output embedding is used as a summary representation of the whole sentence for classification tasks.
Click to reveal answer
intermediate
How do you fine-tune BERT for a text classification task?
You add a simple classification layer on top of BERT's output (usually on the [CLS] token embedding) and train the whole model on your labeled data, adjusting weights to improve prediction accuracy.
Click to reveal answer
beginner
What metric is commonly used to evaluate BERT's performance on text classification?
Accuracy is commonly used to measure how many texts BERT correctly classifies out of all examples. Other metrics like F1-score can also be used for imbalanced data.
Click to reveal answer
What does the [SEP] token do in BERT input?
✗ Incorrect
The [SEP] token is used to separate two sentences or segments in BERT input, helping the model understand sentence boundaries.
Which part of BERT's output is used for classification tasks?
✗ Incorrect
The [CLS] token's output embedding is used as a summary representation for classification.
What is the main advantage of fine-tuning BERT instead of training from scratch?
✗ Incorrect
Fine-tuning uses pre-trained knowledge, so it needs less data and training time to perform well.
Which optimizer is commonly used when fine-tuning BERT?
✗ Incorrect
AdamW optimizer is commonly used because it handles weight decay properly during fine-tuning.
What does 'tokenization' mean in BERT preprocessing?
✗ Incorrect
Tokenization splits text into tokens (words or subwords) that BERT can understand.
Explain how BERT processes input text for classification, including tokenization, special tokens, and output usage.
Think about how BERT reads and prepares sentences before predicting.
You got /3 concepts.
Describe the steps to fine-tune a pre-trained BERT model on a new text classification dataset.
Focus on model modification, data preparation, training, and evaluation.
You got /4 concepts.