0
0
NLPml~12 mins

N-gram language models in NLP - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - N-gram language models

An N-gram language model predicts the next word in a sentence by looking at the previous N-1 words. It learns from text data by counting word sequences and uses these counts to guess what comes next.

Data Flow - 6 Stages
1Data in
10000 sentences x variable lengthRaw text sentences collected for training10000 sentences x variable length
"I love machine learning", "The cat sat on the mat"
2Preprocessing
10000 sentences x variable lengthLowercase, remove punctuation, tokenize words10000 sentences x variable length tokens
["i", "love", "machine", "learning"]
3Feature Engineering
10000 sentences x variable length tokensExtract N-grams (e.g., bigrams for N=2)List of N-grams with counts
[('i', 'love'), ('love', 'machine'), ('machine', 'learning')]
4Model Trains
N-gram countsCalculate probabilities of next word given previous N-1 wordsProbability tables for N-grams
P('learning'|'machine') = 0.8
5Metrics Improve
Validation text dataCalculate perplexity to measure model qualityPerplexity score (lower is better)
Perplexity = 120
6Prediction
Previous N-1 wordsUse probability tables to predict next wordNext word prediction
Input: 'learning' -> Output: 'is'
Training Trace - Epoch by Epoch

5.0 |*****
4.0 |**** 
3.0 |***  
2.0 |**   
1.0 |*    
    +-----
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
15.00.10Initial model with random-like predictions
23.80.25Model starts learning common word sequences
32.90.40Better prediction of next words
42.30.55Model captures frequent N-grams well
51.90.65Converging with improved next word guesses
Prediction Trace - 3 Layers
Layer 1: Input previous words
Layer 2: Look up N-gram probabilities
Layer 3: Select next word
Model Quiz - 3 Questions
Test your understanding
What does the N-gram model use to predict the next word?
AThe entire sentence
BRandom word selection
CThe previous N-1 words
DOnly the first word
Key Insight
N-gram models learn word patterns by counting sequences and estimating probabilities. As training progresses, the model better predicts the next word, shown by decreasing loss and perplexity. This simple approach captures local word context effectively.