0
0
NLPml~5 mins

Lemmatization in spaCy in NLP

Choose your learning style9 modes available
Introduction

Lemmatization helps find the base form of words. It makes text easier to analyze by treating different forms of a word as one.

When you want to count how often a word appears, ignoring its different forms.
When you need to compare words in their simplest form for search or matching.
When cleaning text data before training a language model.
When analyzing text to find the main meaning without extra word endings.
Syntax
NLP
import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp('running runs ran')
lemmas = [token.lemma_ for token in doc]

Use token.lemma_ to get the base form (lemma) of each word.

Make sure to load a spaCy language model like en_core_web_sm before lemmatization.

Examples
This example shows lemmatization of plural and verb forms.
NLP
import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp('cats are running')
lemmas = [token.lemma_ for token in doc]
print(lemmas)
Lemmatization also handles irregular forms like comparative and superlative adjectives.
NLP
import spacy

nlp = spacy.load('en_core_web_sm')
doc = nlp('better best good')
lemmas = [token.lemma_ for token in doc]
print(lemmas)
Sample Model

This program loads spaCy's English model, processes a sentence, and prints the base forms of each word.

NLP
import spacy

# Load English model
nlp = spacy.load('en_core_web_sm')

# Text with different word forms
text = 'The children are playing and played in the playground.'

doc = nlp(text)

# Extract lemmas
lemmas = [token.lemma_ for token in doc]

print('Original text:', text)
print('Lemmatized tokens:', lemmas)
OutputSuccess
Important Notes

Lemmatization depends on the word's context, so spaCy uses part-of-speech tags to get accurate lemmas.

Stop words like 'the' keep their lemma as is because they are already base forms.

Summary

Lemmatization finds the base form of words to simplify text analysis.

Use token.lemma_ in spaCy after loading a language model.

It helps treat different word forms as the same word for better understanding.