Stemming helps reduce words to their root form so computers can understand similar words as the same. It makes text simpler and easier to analyze.
0
0
Stemming (Porter, Snowball) in NLP
Introduction
When you want to group similar words like 'running' and 'runs' as the same word 'run' in a search engine.
When cleaning text data before training a machine learning model to reduce word variety.
When you want to count word frequencies ignoring different endings like 'talked' and 'talking'.
When building a chatbot that should understand different forms of a word as one concept.
Syntax
NLP
from nltk.stem import PorterStemmer, SnowballStemmer porter = PorterStemmer() snowball = SnowballStemmer('english') stemmed_word_porter = porter.stem('running') stemmed_word_snowball = snowball.stem('running')
PorterStemmer is one of the oldest and most common stemmers.
SnowballStemmer is newer and supports multiple languages, often giving better results.
Examples
Stems 'running' to 'run' using PorterStemmer.
NLP
from nltk.stem import PorterStemmer porter = PorterStemmer() print(porter.stem('running'))
Stems 'running' to 'run' using SnowballStemmer.
NLP
from nltk.stem import SnowballStemmer snowball = SnowballStemmer('english') print(snowball.stem('running'))
Shows how PorterStemmer reduces different forms to similar roots.
NLP
words = ['runs', 'running', 'runner'] porter = PorterStemmer() stemmed = [porter.stem(w) for w in words] print(stemmed)
Sample Model
This program shows how both Porter and Snowball stemmers reduce words to their root forms. It prints the original words and their stemmed versions.
NLP
from nltk.stem import PorterStemmer, SnowballStemmer words = ['running', 'runs', 'runner', 'easily', 'fairly'] porter = PorterStemmer() snowball = SnowballStemmer('english') porter_stems = [porter.stem(word) for word in words] snowball_stems = [snowball.stem(word) for word in words] print('Original words:', words) print('Porter stems:', porter_stems) print('Snowball stems:', snowball_stems)
OutputSuccess
Important Notes
Stemming may produce roots that are not real words, but they help group similar words.
SnowballStemmer often gives cleaner stems than PorterStemmer.
Stemming is different from lemmatization, which returns real dictionary words.
Summary
Stemming reduces words to their base form to simplify text.
Porter and Snowball are popular stemmers with slightly different results.
Use stemming to improve text analysis and machine learning on text data.