How to Use WordNetLemmatizer in NLTK for NLP Tasks
Use
WordNetLemmatizer from NLTK to convert words to their base or dictionary form, called lemmas. Initialize it with WordNetLemmatizer() and call lemmatize(word, pos) where pos is the part of speech to get accurate results.Syntax
The WordNetLemmatizer class is used to create a lemmatizer object. The main method is lemmatize(word, pos='n'), where:
word: the word to lemmatizepos: part of speech tag (default is 'n' for noun). Common tags are 'n' (noun), 'v' (verb), 'a' (adjective), 'r' (adverb).
python
from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() lemma = lemmatizer.lemmatize('running', pos='v')
Example
This example shows how to lemmatize different words with their correct parts of speech to get their base forms.
python
from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() words = ['running', 'better', 'cats', 'geese', 'flying'] lemmas = { 'running': lemmatizer.lemmatize('running', pos='v'), # verb 'better': lemmatizer.lemmatize('better', pos='a'), # adjective 'cats': lemmatizer.lemmatize('cats', pos='n'), # noun 'geese': lemmatizer.lemmatize('geese', pos='n'), # noun 'flying': lemmatizer.lemmatize('flying', pos='v') # verb } print(lemmas)
Output
{'running': 'run', 'better': 'good', 'cats': 'cat', 'geese': 'goose', 'flying': 'fly'}
Common Pitfalls
One common mistake is not specifying the correct pos tag, which can lead to incorrect lemmas. For example, lemmatizing 'running' as a noun returns 'running' unchanged instead of 'run'. Also, forgetting to import or download required NLTK data can cause errors.
python
from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() # Wrong: no pos specified, defaults to noun print(lemmatizer.lemmatize('running')) # Output: 'running' # Right: specify verb print(lemmatizer.lemmatize('running', pos='v')) # Output: 'run'
Output
running
run
Quick Reference
| Parameter | Description | Example |
|---|---|---|
| word | The word to lemmatize | 'cats' |
| pos | Part of speech tag: 'n' (noun), 'v' (verb), 'a' (adj), 'r' (adv) | 'v' for verb |
| lemmatize(word, pos) | Returns the base form of the word | 'run' from 'running' |
Key Takeaways
Always specify the correct part of speech (pos) for accurate lemmatization.
WordNetLemmatizer reduces words to their dictionary base form called lemmas.
Import WordNetLemmatizer from nltk.stem and initialize before use.
Without pos, lemmatize assumes noun, which may give incorrect results.
Download NLTK WordNet data with nltk.download('wordnet') if needed.
