Complete the code to import the BERT tokenizer from the transformers library.
from transformers import [1]
The BERT tokenizer is imported using BertTokenizer from the transformers library.
Complete the code to create a masked language modeling label tensor where tokens to predict are marked with their IDs and others with -100.
labels = input_ids.clone()
labels[~masked_indices] = [1]In BERT pre-training, tokens not masked are set to -100 in labels to ignore them during loss calculation.
Fix the error in the code that creates the attention mask for BERT input tokens.
attention_mask = (input_ids != [1]).long()The attention mask marks padding tokens (usually 0) as 0 and real tokens as 1.
Fill both blanks to create a dictionary for masked language modeling inputs and labels.
inputs = {
'input_ids': [1],
'labels': [2]
}The inputs dictionary for BERT MLM includes input_ids and labels for training.
Fill the blank to compute the masked language modeling loss using the model outputs and labels.
from torch.nn import CrossEntropyLoss loss_fct = CrossEntropyLoss() logits = outputs.logits masked_lm_loss = loss_fct(logits.view(-1, logits.size([1])), labels.view(-1))
The logits tensor shape is (batch_size, sequence_length, vocab_size), so dimension 2 is vocab size for loss calculation.