N-gram language models predict the next word based on previous words. The key metric is Perplexity. It measures how well the model predicts a sample. Lower perplexity means the model is better at guessing the next word, like guessing the next word in a sentence with less surprise.
Perplexity is important because it directly shows how uncertain the model is. A model with low perplexity is more confident and accurate in its predictions.