0
0
NLPml~5 mins

N-gram language models in NLP

Choose your learning style9 modes available
Introduction

N-gram language models help computers guess the next word in a sentence by looking at the last few words. This makes talking to machines feel more natural.

When building a simple text predictor like a phone keyboard suggestion.
When checking if a sentence sounds natural or not.
When creating a basic chatbot that replies with common phrases.
When analyzing how often word groups appear in a book or article.
Syntax
NLP
def n_gram_model(text, n):
    tokens = text.split()
    n_grams = [tuple(tokens[i:i+n]) for i in range(len(tokens)-n+1)]
    return n_grams

The function splits text into words and groups them into sequences of length n.

Each group is called an n-gram, like pairs (bigrams) or triples (trigrams).

Examples
Returns bigrams: [('I', 'love'), ('love', 'machine'), ('machine', 'learning')]
NLP
n_gram_model('I love machine learning', 2)
Returns unigrams (single words): [('Hello',), ('world',)]
NLP
n_gram_model('Hello world', 1)
Returns trigrams: [('Data', 'science', 'is'), ('science', 'is', 'fun')]
NLP
n_gram_model('Data science is fun', 3)
Sample Model

This program splits a sentence into bigrams (pairs of words), counts how often each pair appears, and prints the counts.

NLP
from collections import Counter

def n_gram_model(text, n):
    tokens = text.lower().split()
    n_grams = [tuple(tokens[i:i+n]) for i in range(len(tokens)-n+1)]
    return n_grams

# Sample text
text = 'I love machine learning and I love coding'

# Create bigrams
bigrams = n_gram_model(text, 2)

# Count frequency of each bigram
bigram_counts = Counter(bigrams)

print('Bigrams and their counts:')
for bigram, count in bigram_counts.items():
    print(f'{bigram}: {count}')
OutputSuccess
Important Notes

N-gram models are simple but can miss meaning because they only look at a few words at a time.

They work best with lots of text to learn common word patterns.

Higher n (like 3 or 4) means more context but needs more data and computing power.

Summary

N-gram models group words into sequences to predict or analyze text.

They are easy to build and useful for simple language tasks.

Counting n-grams helps understand common word patterns in text.