Bird
Raised Fist0
NlpConceptBeginner ยท 3 min read

What is Lemmatization in NLP: Definition and Examples

In Natural Language Processing (NLP), lemmatization is the process of converting words to their base or dictionary form called a lemma. It helps in understanding the meaning of words by grouping different forms of a word together, like "running" and "ran" to "run".
โš™๏ธ

How It Works

Lemmatization works by looking at the context and the part of speech of a word to find its base form, called the lemma. Imagine you have different forms of a word like "running," "ran," and "runs." Lemmatization groups all these forms under the base word "run."

Think of it like organizing a messy drawer where you put all similar items together. Instead of treating "cats" and "cat" as different things, lemmatization treats them as the same word, which helps computers understand text better.

๐Ÿ’ป

Example

This example uses Python's NLTK library to lemmatize words and show their base forms.

python
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

# Download required data
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')

lemmatizer = WordNetLemmatizer()

# Helper function to convert nltk POS tags to wordnet POS tags

def get_wordnet_pos(tag):
    if tag.startswith('J'):
        return wordnet.ADJ
    elif tag.startswith('V'):
        return wordnet.VERB
    elif tag.startswith('N'):
        return wordnet.NOUN
    elif tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN

sentence = "The striped bats are hanging on their feet and ran quickly"
words = nltk.word_tokenize(sentence)

# Get POS tags
pos_tags = nltk.pos_tag(words)

# Lemmatize each word with its POS tag
lemmatized_words = [lemmatizer.lemmatize(word, get_wordnet_pos(pos)) for word, pos in pos_tags]

print('Original words:', words)
print('Lemmatized words:', lemmatized_words)
Output
Original words: ['The', 'striped', 'bats', 'are', 'hanging', 'on', 'their', 'feet', 'and', 'ran', 'quickly'] Lemmatized words: ['The', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'foot', 'and', 'run', 'quickly']
๐ŸŽฏ

When to Use

Lemmatization is useful when you want to analyze text by its meaning rather than its exact form. It helps in tasks like search engines, chatbots, and text summarization by treating different forms of a word as the same.

For example, if a search engine sees "running" and "ran," lemmatization helps it understand both relate to "run," so it can find more relevant results. It is especially helpful when you want to reduce the complexity of text data without losing meaning.

โœ…

Key Points

  • Lemmatization converts words to their dictionary base form called a lemma.
  • It uses the word's context and part of speech to find the correct base form.
  • It helps group different forms of a word to improve text understanding.
  • Commonly used in search, text analysis, and natural language understanding tasks.
โœ…

Key Takeaways

Lemmatization reduces words to their base dictionary form called lemma.
It considers the word's part of speech to find the correct base form.
It improves text analysis by grouping different word forms together.
Use lemmatization in NLP tasks like search engines and chatbots for better understanding.
Lemmatization is different from stemming because it produces real words.