Bird
Raised Fist0
NLPml~20 mins

BERT fine-tuning for classification in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
BERT Fine-tuning Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Model Choice
intermediate
2:00remaining
Choosing the correct BERT model for fine-tuning

You want to fine-tune a BERT model for a text classification task with 3 classes. Which pretrained model is the best starting point?

Abert-base-uncased
Bgpt-2
Cbert-large-cased
Dresnet50
Attempts:
2 left
💡 Hint

Pick a model designed for language understanding and classification.

Predict Output
intermediate
2:00remaining
Output shape after BERT forward pass

Given this code snippet, what is the shape of outputs.logits?

NLP
from transformers import BertForSequenceClassification, BertTokenizer
import torch

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=4)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
inputs = tokenizer(['Hello world', 'Test sentence'], padding=True, return_tensors='pt')
outputs = model(**inputs)
logits_shape = outputs.logits.shape
A(768, 4)
B(4, 2)
C(2, 768)
D(2, 4)
Attempts:
2 left
💡 Hint

Check batch size and number of labels.

Hyperparameter
advanced
2:00remaining
Choosing learning rate for BERT fine-tuning

You are fine-tuning BERT for classification. Which learning rate is most appropriate to start with?

A5e-5
B0.1
C1.0
D0.000001
Attempts:
2 left
💡 Hint

Typical fine-tuning learning rates are small but not too tiny.

Metrics
advanced
2:00remaining
Evaluating BERT classification performance

After fine-tuning BERT on a 3-class classification task, you get these predictions and true labels:

preds = [0, 2, 1, 1, 0]
labels = [0, 1, 1, 2, 0]

What is the accuracy?

A0.8
B0.4
C0.6
D0.2
Attempts:
2 left
💡 Hint

Count how many predictions match labels exactly.

🔧 Debug
expert
2:00remaining
Identifying error in BERT fine-tuning code

What error does this code raise?

from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
inputs = {'input_ids': [[101, 2054, 2003, 1996, 2562, 102]], 'attention_mask': [[1, 1, 1, 1, 1, 1]]}
outputs = model(**inputs)
loss = outputs.loss
AAttributeError: 'SequenceClassifierOutput' object has no attribute 'loss'
BRuntimeError: Expected tensor for input_ids, got list
CTypeError: forward() missing required positional argument 'labels'
DNo error, code runs successfully
Attempts:
2 left
💡 Hint

Check the type/format of input_ids and attention_mask.

Practice

(1/5)
1. What is the main purpose of fine-tuning BERT for a classification task?
easy
A. To adapt BERT's knowledge to classify specific categories in your data
B. To train BERT from scratch on a large dataset
C. To reduce the size of the BERT model for faster inference
D. To convert text into images for classification

Solution

  1. Step 1: Understand BERT's pretraining

    BERT is pretrained on general language tasks and needs adjustment for specific tasks like classification.
  2. Step 2: Purpose of fine-tuning

    Fine-tuning adapts BERT's learned language understanding to classify categories in your dataset.
  3. Final Answer:

    To adapt BERT's knowledge to classify specific categories in your data -> Option A
  4. Quick Check:

    Fine-tuning = adapt BERT for classification [OK]
Hint: Fine-tuning means adjusting BERT for your task, not training from scratch [OK]
Common Mistakes:
  • Thinking fine-tuning trains BERT from zero
  • Confusing fine-tuning with model compression
  • Assuming BERT outputs images
2. Which of the following is the correct way to tokenize text before feeding it to BERT in Python?
easy
A. tokens = text.split(' ')
B. tokens = tokenizer.encode_plus(text, return_tensors='pt')
C. tokens = tokenizer.tokenize(text)
D. tokens = text.lower()

Solution

  1. Step 1: Identify proper BERT tokenization method

    BERT uses tokenizer.encode_plus to convert text into token IDs and attention masks.
  2. Step 2: Compare options

    tokens = tokenizer.encode_plus(text, return_tensors='pt') uses encode_plus with return_tensors='pt' for PyTorch tensors, which is correct for BERT input.
  3. Final Answer:

    tokens = tokenizer.encode_plus(text, return_tensors='pt') -> Option B
  4. Quick Check:

    Use encode_plus for BERT tokenization [OK]
Hint: Use tokenizer.encode_plus or tokenizer() for BERT input [OK]
Common Mistakes:
  • Using simple split instead of tokenizer
  • Only tokenizing without encoding IDs
  • Not returning tensors for model input
3. Given this code snippet for fine-tuning BERT, what will be the output of print(predictions.argmax(dim=1)) if the model predicts logits [[2.0, 1.0], [0.5, 1.5]] for two samples?
logits = torch.tensor([[2.0, 1.0], [0.5, 1.5]])
predictions = logits
print(predictions.argmax(dim=1))
medium
A. tensor([2, 1])
B. tensor([1, 0])
C. tensor([1, 1])
D. tensor([0, 1])

Solution

  1. Step 1: Understand argmax(dim=1)

    Argmax along dim=1 finds the index of max value in each row (sample).
  2. Step 2: Calculate argmax for each sample

    First row: max is 2.0 at index 0; second row: max is 1.5 at index 1.
  3. Final Answer:

    tensor([0, 1]) -> Option D
  4. Quick Check:

    Argmax per row = [0, 1] [OK]
Hint: Argmax dim=1 picks max index per sample row [OK]
Common Mistakes:
  • Confusing dim=0 with dim=1
  • Mixing up indices and values
  • Expecting values instead of indices
4. You run this training loop snippet but get a runtime error: TypeError: forward() missing 1 required positional argument: 'labels'. What is the likely fix?
outputs = model(input_ids, attention_mask)
loss = outputs.loss
loss.backward()
medium
A. Pass labels to the model call: model(input_ids, attention_mask, labels=labels)
B. Remove loss.backward() call
C. Change input_ids to input_id
D. Call model with only input_ids

Solution

  1. Step 1: Understand error cause

    The model expects labels to compute loss but they are missing in the call.
  2. Step 2: Fix by passing labels

    Include labels argument in model call to get loss: model(input_ids, attention_mask, labels=labels).
  3. Final Answer:

    Pass labels to the model call: model(input_ids, attention_mask, labels=labels) -> Option A
  4. Quick Check:

    Missing labels argument causes loss error [OK]
Hint: Always pass labels to get loss during training [OK]
Common Mistakes:
  • Ignoring the missing labels argument
  • Removing backward call instead of fixing input
  • Changing variable names incorrectly
5. You want to fine-tune BERT on a small dataset for sentiment classification. Which strategy helps avoid overfitting during training?
hard
A. Train BERT without tokenization to save time
B. Increase batch size to maximum and train longer
C. Use a small learning rate and add dropout layers
D. Remove the classification head and train only embeddings

Solution

  1. Step 1: Identify overfitting risks

    Small datasets can cause the model to memorize instead of generalize.
  2. Step 2: Apply regularization techniques

    Using a small learning rate and dropout helps the model learn smoothly and avoid overfitting.
  3. Final Answer:

    Use a small learning rate and add dropout layers -> Option C
  4. Quick Check:

    Small LR + dropout reduces overfitting [OK]
Hint: Small learning rate + dropout helps generalize on small data [OK]
Common Mistakes:
  • Training longer without regularization
  • Skipping tokenization
  • Removing classification head incorrectly