Recall & Review
beginner
What is BERT in the context of natural language processing?
BERT stands for Bidirectional Encoder Representations from Transformers. It is a model that understands language by looking at words before and after a target word, helping it grasp context better.
Click to reveal answer
beginner
Why do we fine-tune BERT for classification tasks?
Fine-tuning adjusts BERT's pre-trained knowledge to a specific task, like classifying text, by training it on labeled examples so it learns to make predictions for that task.
Click to reveal answer
intermediate
What is the role of the [CLS] token in BERT fine-tuning for classification?
The [CLS] token is a special token added at the start of input text. Its output embedding is used as a summary representation of the whole input for classification decisions.
Click to reveal answer
intermediate
How is the output layer structured in BERT fine-tuning for a binary classification task?
A simple linear layer is added on top of BERT's [CLS] output embedding, followed by a sigmoid activation to predict the probability of the positive class.
Click to reveal answer
beginner
What metrics are commonly used to evaluate BERT classification models?
Accuracy, precision, recall, and F1-score are common metrics. They measure how well the model predicts correct classes and balances false positives and negatives.
Click to reveal answer
What does fine-tuning BERT involve?
✗ Incorrect
Fine-tuning means adjusting the pre-trained BERT model weights on a specific task dataset to improve performance.
Which token's output embedding is used for classification in BERT?
✗ Incorrect
The [CLS] token's output embedding summarizes the input and is used for classification.
What activation function is commonly used for binary classification output in BERT fine-tuning?
✗ Incorrect
Sigmoid activation outputs a probability between 0 and 1 for binary classification.
Which metric is NOT typically used to evaluate classification models?
✗ Incorrect
Mean Squared Error is used for regression, not classification.
What is the main advantage of BERT's bidirectional training?
✗ Incorrect
BERT reads text in both directions to better understand context.
Explain the steps to fine-tune BERT for a text classification task.
Think about starting with BERT, adding a layer, training on examples, and checking results.
You got /5 concepts.
Describe why the [CLS] token is important in BERT fine-tuning for classification.
Consider how BERT summarizes input for decision making.
You got /4 concepts.