BERT pre-training uses two main tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). For MLM, the key metric is cross-entropy loss, which measures how well the model predicts missing words. Lower loss means better understanding of language context.
For NSP, accuracy is important because it shows how well the model predicts if one sentence follows another. High accuracy means the model learns sentence relationships well.
These metrics matter because BERT learns language patterns without labeled data. Good MLM loss and NSP accuracy mean BERT can understand words and sentence order, which helps later tasks like question answering.