beginner

What does BERT stand for and what is its main purpose?

BERT stands for Bidirectional Encoder Representations from Transformers. It is a model designed to understand the context of words in a sentence by looking at both left and right sides, helping machines understand language better.

Click to reveal answer

beginner

Why is BERT called 'bidirectional'?

Because BERT reads the text from both directions (left to right and right to left) at the same time, it understands the full context of a word based on all surrounding words.

Click to reveal answer

intermediate

What is the role of the [CLS] token in BERT for text classification?

The [CLS] token is added at the start of every input sentence. After processing, its output embedding is used as a summary representation of the whole sentence for classification tasks.

Click to reveal answer

intermediate

How do you fine-tune BERT for a text classification task?

You add a simple classification layer on top of BERT's output (usually on the [CLS] token embedding) and train the whole model on your labeled data, adjusting weights to improve prediction accuracy.

Click to reveal answer

beginner

What metric is commonly used to evaluate BERT's performance on text classification?

Accuracy is commonly used to measure how many texts BERT correctly classifies out of all examples. Other metrics like F1-score can also be used for imbalanced data.

Click to reveal answer

What does the [SEP] token do in BERT input?

ARepresents unknown words

BSeparates two sentences in input

CIndicates padding tokens

DMarks the start of the sentence

Which part of BERT's output is used for classification tasks?

AOutput embedding of the [CLS] token

BOutput embedding of the last word

CSum of all token embeddings

DInput token embeddings

What is the main advantage of fine-tuning BERT instead of training from scratch?

AOnly works for images

BNeeds more data and time

CDoes not improve accuracy

DRequires less data and time

Which optimizer is commonly used when fine-tuning BERT?

ARMSProp

BSGD

CAdamW

DAdagrad

What does 'tokenization' mean in BERT preprocessing?

ASplitting text into smaller pieces called tokens

BConverting text to uppercase

CRemoving punctuation only

DTranslating text to another language

Explain how BERT processes input text for classification, including tokenization, special tokens, and output usage.

Describe the steps to fine-tune a pre-trained BERT model on a new text classification dataset.