N-gram language models help computers guess the next word in a sentence by looking at the last few words. This makes talking to machines feel more natural.
0
0
N-gram language models in NLP
Introduction
When building a simple text predictor like a phone keyboard suggestion.
When checking if a sentence sounds natural or not.
When creating a basic chatbot that replies with common phrases.
When analyzing how often word groups appear in a book or article.
Syntax
NLP
def n_gram_model(text, n): tokens = text.split() n_grams = [tuple(tokens[i:i+n]) for i in range(len(tokens)-n+1)] return n_grams
The function splits text into words and groups them into sequences of length n.
Each group is called an n-gram, like pairs (bigrams) or triples (trigrams).
Examples
Returns bigrams: [('I', 'love'), ('love', 'machine'), ('machine', 'learning')]
NLP
n_gram_model('I love machine learning', 2)
Returns unigrams (single words): [('Hello',), ('world',)]
NLP
n_gram_model('Hello world', 1)
Returns trigrams: [('Data', 'science', 'is'), ('science', 'is', 'fun')]
NLP
n_gram_model('Data science is fun', 3)
Sample Model
This program splits a sentence into bigrams (pairs of words), counts how often each pair appears, and prints the counts.
NLP
from collections import Counter def n_gram_model(text, n): tokens = text.lower().split() n_grams = [tuple(tokens[i:i+n]) for i in range(len(tokens)-n+1)] return n_grams # Sample text text = 'I love machine learning and I love coding' # Create bigrams bigrams = n_gram_model(text, 2) # Count frequency of each bigram bigram_counts = Counter(bigrams) print('Bigrams and their counts:') for bigram, count in bigram_counts.items(): print(f'{bigram}: {count}')
OutputSuccess
Important Notes
N-gram models are simple but can miss meaning because they only look at a few words at a time.
They work best with lots of text to learn common word patterns.
Higher n (like 3 or 4) means more context but needs more data and computing power.
Summary
N-gram models group words into sequences to predict or analyze text.
They are easy to build and useful for simple language tasks.
Counting n-grams helps understand common word patterns in text.