0
0
PyTorchml~20 mins

BERT for text classification in PyTorch - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
BERT Text Classification Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output shape of BERT's last hidden state?
Given the following PyTorch code snippet using a pretrained BERT model, what is the shape of the tensor last_hidden_state?
PyTorch
from transformers import BertModel, BertTokenizer
import torch

model = BertModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

inputs = tokenizer('Hello world!', return_tensors='pt')
outputs = model(**inputs)
last_hidden_state = outputs.last_hidden_state
print(last_hidden_state.shape)
Atorch.Size([1, 3, 768])
Btorch.Size([1, 2, 768])
Ctorch.Size([1, 5, 768])
Dtorch.Size([1, 4, 768])
Attempts:
2 left
💡 Hint
Count the number of tokens including special tokens added by the tokenizer.
Model Choice
intermediate
1:30remaining
Which BERT output is best for sentence classification?
When using BERT for text classification, which output from the model is typically used as input to the classification head?
AThe pooled output corresponding to the [CLS] token
BThe last hidden state of all tokens concatenated
CThe embedding of the first token in the vocabulary
DThe sum of all token embeddings
Attempts:
2 left
💡 Hint
Think about which token is designed to represent the whole sentence.
Hyperparameter
advanced
1:30remaining
Choosing learning rate for fine-tuning BERT
Which learning rate is generally recommended when fine-tuning a pretrained BERT model for text classification?
A5e-5
B0.1
C1.0
D0.000001
Attempts:
2 left
💡 Hint
Fine-tuning large pretrained models usually requires a small learning rate.
Metrics
advanced
1:30remaining
Which metric is best for imbalanced text classification?
For a text classification task with highly imbalanced classes, which evaluation metric gives the most reliable performance measure?
APrecision
BF1-score
CAccuracy
DMean Squared Error
Attempts:
2 left
💡 Hint
Consider a metric that balances false positives and false negatives.
🔧 Debug
expert
2:30remaining
Why does this BERT fine-tuning code raise a RuntimeError?
Consider this PyTorch code snippet for fine-tuning BERT. It raises a RuntimeError: 'element 0 of tensors does not require grad and does not have a grad_fn'. What is the cause?
PyTorch
from transformers import BertForSequenceClassification, BertTokenizer
import torch

model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

inputs = tokenizer('Test input', return_tensors='pt')
labels = torch.tensor([1])

outputs = model(**inputs, labels=labels)
loss = outputs.loss
loss.backward()
AThe inputs dictionary keys are incorrect
BThe labels tensor is not on the same device as model inputs
CThe tokenizer output is missing attention_mask
DThe model is in eval mode instead of train mode
Attempts:
2 left
💡 Hint
Check if gradients are enabled during loss computation.