Bird
Raised Fist0
NlpHow-ToBeginner ยท 3 min read

How to Use NLTK Stemmer in NLP for Text Processing

To use an NLTK stemmer in NLP, first import a stemmer like PorterStemmer from nltk.stem. Then create a stemmer object and call its stem() method on words to get their root forms, which helps simplify text analysis.
๐Ÿ“

Syntax

The basic syntax to use an NLTK stemmer involves importing the stemmer class, creating an instance, and applying the stem() method to words.

  • from nltk.stem import PorterStemmer: imports the Porter stemmer class.
  • stemmer = PorterStemmer(): creates a stemmer object.
  • stemmer.stem(word): returns the stemmed form of the input word.
python
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
word = 'running'
stemmed_word = stemmer.stem(word)
print(stemmed_word)
Output
run
๐Ÿ’ป

Example

This example shows how to stem a list of words using NLTK's PorterStemmer. It demonstrates how different word forms reduce to the same root.

python
from nltk.stem import PorterStemmer

words = ['running', 'runs', 'runner', 'easily', 'fairly']
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)
Output
['run', 'run', 'runner', 'easili', 'fairli']
โš ๏ธ

Common Pitfalls

Common mistakes include:

  • Not installing or importing NLTK properly before use.
  • Confusing stemming with lemmatization; stemming cuts words roughly, which can cause non-words.
  • Applying stemmer to sentences without splitting into words first.

Always tokenize text into words before stemming.

python
from nltk.stem import PorterStemmer

# Wrong: stemming a sentence string directly
stemmer = PorterStemmer()
sentence = 'He is running fast'
# This will treat the whole sentence as one word
print(stemmer.stem(sentence))

# Right: tokenize first, then stem each word
words = sentence.split()
stemmed = [stemmer.stem(word) for word in words]
print(stemmed)
Output
He is running fast ['He', 'is', 'run', 'fast']
๐Ÿ“Š

Quick Reference

StepDescriptionCode Example
Import StemmerImport the stemmer class from nltk.stemfrom nltk.stem import PorterStemmer
Create StemmerMake an instance of the stemmerstemmer = PorterStemmer()
Stem WordApply stem() method to a wordstemmer.stem('running') # returns 'run'
Stem ListStem multiple words using list comprehension[stemmer.stem(w) for w in words]
โœ…

Key Takeaways

Import and create an NLTK stemmer object before stemming words.
Use the stem() method on individual words, not full sentences.
Stemming reduces words to root forms but may produce non-words.
Always tokenize text into words before applying stemming.
PorterStemmer is a common choice for English stemming in NLTK.