Bird
Raised Fist0
NlpHow-ToBeginner ยท 3 min read

How to Use WordNetLemmatizer in NLTK for NLP Tasks

Use WordNetLemmatizer from NLTK to convert words to their base or dictionary form, called lemmas. Initialize it with WordNetLemmatizer() and call lemmatize(word, pos) where pos is the part of speech to get accurate results.
๐Ÿ“

Syntax

The WordNetLemmatizer class is used to create a lemmatizer object. The main method is lemmatize(word, pos='n'), where:

  • word: the word to lemmatize
  • pos: part of speech tag (default is 'n' for noun). Common tags are 'n' (noun), 'v' (verb), 'a' (adjective), 'r' (adverb).
python
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
lemma = lemmatizer.lemmatize('running', pos='v')
๐Ÿ’ป

Example

This example shows how to lemmatize different words with their correct parts of speech to get their base forms.

python
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

words = ['running', 'better', 'cats', 'geese', 'flying']

lemmas = {
    'running': lemmatizer.lemmatize('running', pos='v'),  # verb
    'better': lemmatizer.lemmatize('better', pos='a'),    # adjective
    'cats': lemmatizer.lemmatize('cats', pos='n'),        # noun
    'geese': lemmatizer.lemmatize('geese', pos='n'),      # noun
    'flying': lemmatizer.lemmatize('flying', pos='v')     # verb
}

print(lemmas)
Output
{'running': 'run', 'better': 'good', 'cats': 'cat', 'geese': 'goose', 'flying': 'fly'}
โš ๏ธ

Common Pitfalls

One common mistake is not specifying the correct pos tag, which can lead to incorrect lemmas. For example, lemmatizing 'running' as a noun returns 'running' unchanged instead of 'run'. Also, forgetting to import or download required NLTK data can cause errors.

python
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

# Wrong: no pos specified, defaults to noun
print(lemmatizer.lemmatize('running'))  # Output: 'running'

# Right: specify verb
print(lemmatizer.lemmatize('running', pos='v'))  # Output: 'run'
Output
running run
๐Ÿ“Š

Quick Reference

ParameterDescriptionExample
wordThe word to lemmatize'cats'
posPart of speech tag: 'n' (noun), 'v' (verb), 'a' (adj), 'r' (adv)'v' for verb
lemmatize(word, pos)Returns the base form of the word'run' from 'running'
โœ…

Key Takeaways

Always specify the correct part of speech (pos) for accurate lemmatization.
WordNetLemmatizer reduces words to their dictionary base form called lemmas.
Import WordNetLemmatizer from nltk.stem and initialize before use.
Without pos, lemmatize assumes noun, which may give incorrect results.
Download NLTK WordNet data with nltk.download('wordnet') if needed.