Lemmatization helps us find the base form of a word. It makes text easier to understand and analyze by turning words like "running" into "run".
0
0
Lemmatization in NLP
Introduction
When you want to clean text data before analyzing it.
When you need to group different forms of a word together.
When building search engines to match different word forms.
When preparing text for machine learning models.
When summarizing or extracting key information from text.
Syntax
NLP
from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() base_word = lemmatizer.lemmatize(word, pos='v')
The pos parameter tells the lemmatizer the part of speech (like verb or noun). This helps get the correct base form.
If you don't specify pos, it assumes the word is a noun.
Examples
Returns 'run' because 'running' is a verb form.
NLP
lemmatizer.lemmatize('running', pos='v')
Returns 'good' because 'better' is an adjective and its base form is 'good'.
NLP
lemmatizer.lemmatize('better', pos='a')
Returns 'cat' by default assuming the word is a noun.
NLP
lemmatizer.lemmatize('cats')Sample Model
This program lemmatizes a list of words assuming they are verbs. It shows the original and lemmatized words side by side.
NLP
from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() words = ['running', 'cats', 'better', 'geese', 'flying'] lemmatized_words = [lemmatizer.lemmatize(word, pos='v') for word in words] print('Original words:', words) print('Lemmatized words:', lemmatized_words)
OutputSuccess
Important Notes
Lemmatization is different from stemming because it returns real words, not just chopped parts.
Using the correct part of speech (pos) improves lemmatization accuracy.
You may need to download NLTK data packages like 'wordnet' before using the lemmatizer.
Summary
Lemmatization finds the base form of words to simplify text.
It helps group word forms for better text analysis.
Always specify the part of speech for best results.