Bird
Raised Fist0
NlpHow-ToBeginner ยท 3 min read

How to Use PorterStemmer in NLTK for NLP Tasks

Use PorterStemmer from the nltk.stem module to reduce words to their root form in NLP tasks. Initialize it with PorterStemmer(), then call stem(word) on each word to get its stemmed version.
๐Ÿ“

Syntax

The PorterStemmer class is imported from nltk.stem. You create an instance with stemmer = PorterStemmer(). Then, use stemmer.stem(word) to get the stem of a single word.

This process helps simplify words by removing common endings, making text analysis easier.

python
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
stemmed_word = stemmer.stem('running')
๐Ÿ’ป

Example

This example shows how to stem a list of words using PorterStemmer. It prints the original word and its stemmed form side by side.

python
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
words = ['running', 'runs', 'runner', 'easily', 'fairly']

for word in words:
    print(f"{word} -> {stemmer.stem(word)}")
Output
running -> run runs -> run runner -> run easily -> easili fairly -> fairli
โš ๏ธ

Common Pitfalls

One common mistake is trying to stem sentences or phrases directly instead of individual words. PorterStemmer works on single words only.

Another pitfall is expecting perfect English roots; stemming is a heuristic and may produce stems that are not real words.

python
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()

# Wrong: stemming a sentence
sentence = 'He is running fast'
# This will cause an error or wrong output
# stemmed = stemmer.stem(sentence)  # Incorrect

# Right: stem each word separately
words = sentence.split()
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)
Output
['He', 'is', 'run', 'fast']
๐Ÿ“Š

Quick Reference

MethodDescription
PorterStemmer()Create a new PorterStemmer object
stem(word)Return the stemmed form of the given word
Works on single wordsInput must be one word at a time
Not a lemmatizerStemming may produce non-words
Use for text normalizationHelps reduce word variants
โœ…

Key Takeaways

PorterStemmer reduces words to their root form to simplify text.
Always stem individual words, not whole sentences.
Stemming is a heuristic and may produce stems that are not real words.
Use stemmer.stem(word) to get the stem of a word.
PorterStemmer is useful for text normalization in NLP preprocessing.