0
0
NLPml~5 mins

Lemmatization in NLP

Choose your learning style9 modes available
Introduction

Lemmatization helps us find the base form of a word. It makes text easier to understand and analyze by turning words like "running" into "run".

When you want to clean text data before analyzing it.
When you need to group different forms of a word together.
When building search engines to match different word forms.
When preparing text for machine learning models.
When summarizing or extracting key information from text.
Syntax
NLP
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

base_word = lemmatizer.lemmatize(word, pos='v')

The pos parameter tells the lemmatizer the part of speech (like verb or noun). This helps get the correct base form.

If you don't specify pos, it assumes the word is a noun.

Examples
Returns 'run' because 'running' is a verb form.
NLP
lemmatizer.lemmatize('running', pos='v')
Returns 'good' because 'better' is an adjective and its base form is 'good'.
NLP
lemmatizer.lemmatize('better', pos='a')
Returns 'cat' by default assuming the word is a noun.
NLP
lemmatizer.lemmatize('cats')
Sample Model

This program lemmatizes a list of words assuming they are verbs. It shows the original and lemmatized words side by side.

NLP
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

words = ['running', 'cats', 'better', 'geese', 'flying']

lemmatized_words = [lemmatizer.lemmatize(word, pos='v') for word in words]

print('Original words:', words)
print('Lemmatized words:', lemmatized_words)
OutputSuccess
Important Notes

Lemmatization is different from stemming because it returns real words, not just chopped parts.

Using the correct part of speech (pos) improves lemmatization accuracy.

You may need to download NLTK data packages like 'wordnet' before using the lemmatizer.

Summary

Lemmatization finds the base form of words to simplify text.

It helps group word forms for better text analysis.

Always specify the part of speech for best results.