You want to fine-tune a BERT model for a text classification task with 3 classes. Which pretrained model is the best starting point?
Pick a model designed for language understanding and classification.
bert-base-uncased is a common pretrained BERT model suitable for classification tasks. GPT-2 is a language generation model, and ResNet50 is for images.
Given this code snippet, what is the shape of outputs.logits?
from transformers import BertForSequenceClassification, BertTokenizer import torch model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=4) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') inputs = tokenizer(['Hello world', 'Test sentence'], padding=True, return_tensors='pt') outputs = model(**inputs) logits_shape = outputs.logits.shape
Check batch size and number of labels.
The batch size is 2 (two sentences), and the model outputs logits for 4 classes, so shape is (2, 4).
You are fine-tuning BERT for classification. Which learning rate is most appropriate to start with?
Typical fine-tuning learning rates are small but not too tiny.
5e-5 is a common learning rate for fine-tuning BERT. Too large (0.1, 1.0) causes training instability. Too small (1e-6) slows training.
After fine-tuning BERT on a 3-class classification task, you get these predictions and true labels:
preds = [0, 2, 1, 1, 0] labels = [0, 1, 1, 2, 0]
What is the accuracy?
Count how many predictions match labels exactly.
Matches at indices 0, 2, and 4 → 3 correct out of 5 → accuracy 0.6.
What error does this code raise?
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
inputs = {'input_ids': [[101, 2054, 2003, 1996, 2562, 102]], 'attention_mask': [[1, 1, 1, 1, 1, 1]]}
outputs = model(**inputs)
loss = outputs.lossCheck the type/format of input_ids and attention_mask.
The model expects PyTorch tensors for input_ids and attention_mask, but lists are provided instead, causing a RuntimeError when trying to process them in the embedding layer.