0
0
NLPml~5 mins

Stemming (Porter, Snowball) in NLP

Choose your learning style9 modes available
Introduction

Stemming helps reduce words to their root form so computers can understand similar words as the same. It makes text simpler and easier to analyze.

When you want to group similar words like 'running' and 'runs' as the same word 'run' in a search engine.
When cleaning text data before training a machine learning model to reduce word variety.
When you want to count word frequencies ignoring different endings like 'talked' and 'talking'.
When building a chatbot that should understand different forms of a word as one concept.
Syntax
NLP
from nltk.stem import PorterStemmer, SnowballStemmer

porter = PorterStemmer()
snowball = SnowballStemmer('english')

stemmed_word_porter = porter.stem('running')
stemmed_word_snowball = snowball.stem('running')

PorterStemmer is one of the oldest and most common stemmers.

SnowballStemmer is newer and supports multiple languages, often giving better results.

Examples
Stems 'running' to 'run' using PorterStemmer.
NLP
from nltk.stem import PorterStemmer
porter = PorterStemmer()
print(porter.stem('running'))
Stems 'running' to 'run' using SnowballStemmer.
NLP
from nltk.stem import SnowballStemmer
snowball = SnowballStemmer('english')
print(snowball.stem('running'))
Shows how PorterStemmer reduces different forms to similar roots.
NLP
words = ['runs', 'running', 'runner']
porter = PorterStemmer()
stemmed = [porter.stem(w) for w in words]
print(stemmed)
Sample Model

This program shows how both Porter and Snowball stemmers reduce words to their root forms. It prints the original words and their stemmed versions.

NLP
from nltk.stem import PorterStemmer, SnowballStemmer

words = ['running', 'runs', 'runner', 'easily', 'fairly']

porter = PorterStemmer()
snowball = SnowballStemmer('english')

porter_stems = [porter.stem(word) for word in words]
snowball_stems = [snowball.stem(word) for word in words]

print('Original words:', words)
print('Porter stems:', porter_stems)
print('Snowball stems:', snowball_stems)
OutputSuccess
Important Notes

Stemming may produce roots that are not real words, but they help group similar words.

SnowballStemmer often gives cleaner stems than PorterStemmer.

Stemming is different from lemmatization, which returns real dictionary words.

Summary

Stemming reduces words to their base form to simplify text.

Porter and Snowball are popular stemmers with slightly different results.

Use stemming to improve text analysis and machine learning on text data.