last_hidden_state?from transformers import BertModel, BertTokenizer import torch model = BertModel.from_pretrained('bert-base-uncased') tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') inputs = tokenizer('Hello world!', return_tensors='pt') outputs = model(**inputs) last_hidden_state = outputs.last_hidden_state print(last_hidden_state.shape)
The tokenizer adds special tokens [CLS] and [SEP]. The text 'Hello world!' is tokenized into 'hello', 'world', and '!', for a total sequence length of 5 tokens: [CLS] hello world ! [SEP]. The batch size is 1, and hidden size for bert-base-uncased is 768.
The [CLS] token's pooled output is designed to represent the entire input sequence and is commonly used for classification tasks.
Typical fine-tuning learning rates for BERT are in the range 2e-5 to 5e-5. Larger rates like 0.1 or 1.0 are too high and cause training instability.
F1-score balances precision and recall, making it better for imbalanced datasets than accuracy which can be misleading.
from transformers import BertForSequenceClassification, BertTokenizer import torch model = BertForSequenceClassification.from_pretrained('bert-base-uncased') tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') inputs = tokenizer('Test input', return_tensors='pt') labels = torch.tensor([1]) outputs = model(**inputs, labels=labels) loss = outputs.loss loss.backward()
If the model is in evaluation mode (model.eval()), gradients are disabled, so loss does not track gradients causing this error.