NLPml~8 mins

N-gram language models in NLP - Model Metrics & Evaluation

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Metrics & Evaluation - N-gram language models

Which metric matters for N-gram language models and WHY

N-gram language models predict the next word based on previous words. The key metric is Perplexity. It measures how well the model predicts a sample. Lower perplexity means the model is better at guessing the next word, like guessing the next word in a sentence with less surprise.

Perplexity is important because it directly shows how uncertain the model is. A model with low perplexity is more confident and accurate in its predictions.

Confusion matrix or equivalent visualization

N-gram models predict probabilities for many possible next words, so a confusion matrix is not typical. Instead, we look at Perplexity, which is calculated from the probabilities the model assigns to the correct next words.

Perplexity = 2^(- (1/N) * sum(log2 P(w_i | context)))

Where:
- N is the number of words
- P(w_i | context) is the predicted probability of the actual next word

Example:
If the model predicts the next word with probabilities: 
"cat"=0.5, "dog"=0.3, "bird"=0.2,
and the actual next word is "cat", the log probability is log2(0.5) = -1.
Perplexity measures the average of these log probabilities over the test set.

Precision vs Recall tradeoff with concrete examples

Precision and recall are not commonly used for N-gram language models because predictions are probabilistic over many words, not binary decisions.

Instead, the tradeoff is between model complexity and data sparsity. Using larger N-grams (like 4-grams) can improve prediction but needs more data. Smaller N-grams (like bigrams) are simpler but less precise.

Example: A trigram model might predict "I love cats" better than a bigram model, but if you don't have enough data, the trigram model might guess poorly because it never saw "I love" before.

What "good" vs "bad" metric values look like for N-gram models

Good: Low perplexity, close to 10 or less on a typical English dataset means the model predicts well.

Bad: High perplexity, like 100 or more, means the model is very uncertain and guesses poorly.

Note: Perplexity depends on dataset size and vocabulary. Comparing perplexity only makes sense between models on the same data.

Common pitfalls in metrics for N-gram language models

Data sparsity: Many N-grams never appear in training, causing zero probabilities and infinite perplexity if not smoothed.
Overfitting: Very large N-grams memorize training data but fail on new text, causing low training perplexity but high test perplexity.
Ignoring smoothing: Without smoothing techniques, the model assigns zero probability to unseen N-grams, breaking perplexity calculation.
Comparing perplexity across datasets: Perplexity values are not comparable if datasets differ in size or vocabulary.

Self-check question

Your trigram model has a perplexity of 150 on the test set but 20 on the training set. Is this model good? Why or why not?

Answer: No, this model is not good. The low training perplexity means it learned the training data well, but the very high test perplexity means it does not predict new text well. This suggests overfitting and poor generalization.

Key Result

Perplexity is the key metric for N-gram models; lower perplexity means better next-word prediction.

Practice

(1/5)

1. What does an n-gram language model primarily do?

easy

A. Predict the next word based on previous words

B. Translate text from one language to another

C. Generate images from text descriptions

D. Detect the sentiment of a sentence

N-gram language models in NLP - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of n-gram models

Step 2: Identify the main function

Final Answer:

Quick Check:

Solution

Step 1: Understand bigrams

Step 2: Extract bigrams from 'I love AI'

Final Answer:

Quick Check:

Solution

Step 1: Identify trigrams in the sentence

Step 2: Count the trigram ('the', 'cat', 'sat')

Final Answer:

Quick Check:

Solution

Step 1: Analyze the loop range

Step 2: Check index access inside loop

Final Answer:

Quick Check:

Solution

Step 1: Understand sparse data in n-gram models

Step 2: Identify smoothing techniques

Final Answer:

Quick Check: