0
0
NLPml~5 mins

Lemmatization in spaCy in NLP - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is lemmatization in natural language processing?
Lemmatization is the process of converting a word to its base or dictionary form, called a lemma. For example, 'running' becomes 'run'. It helps in understanding the meaning of words by grouping different forms of the same word.
Click to reveal answer
intermediate
How does spaCy perform lemmatization?
spaCy uses a built-in language model that includes rules and lookup tables to find the lemma of a word based on its context and part of speech. This helps spaCy return the correct base form of words during text processing.
Click to reveal answer
beginner
Which spaCy attribute gives the lemma of a token?
The attribute is token.lemma_. It returns the lemma as a string for each token in the processed text.
Click to reveal answer
intermediate
Why is lemmatization better than simple stemming?
Lemmatization returns real dictionary words as base forms, considering context and part of speech, while stemming just cuts word endings and may produce non-words. Lemmatization gives more accurate and meaningful results.
Click to reveal answer
beginner
Show a simple Python code snippet using spaCy to lemmatize the sentence: 'The cats are running quickly.'
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('The cats are running quickly.')
lemmas = [token.lemma_ for token in doc]
print(lemmas)

This prints: ['the', 'cat', 'be', 'run', 'quickly', '.']
Click to reveal answer
What does the spaCy attribute token.lemma_ return?
AThe word's frequency in the text
BThe part of speech tag
CThe original word text
DThe base form of the word
Which of these is a benefit of lemmatization over stemming?
ARemoves stop words automatically
BRuns faster than stemming
CProduces real dictionary words
DIgnores word context
In spaCy, what must you do before accessing token.lemma_?
ALoad a language model and process text with <code>nlp()</code>
BManually define lemmas for each word
CCall a separate lemmatization function
DNothing, it works on raw text
What is the lemma of the word 'running' in spaCy's default English model?
Aran
Brun
Crunning
Drunner
Which spaCy model is commonly used for English lemmatization?
Aen_core_web_sm
Bfr_core_news_sm
Cde_core_news_sm
Dxx_ent_wiki_sm
Explain what lemmatization is and how spaCy helps perform it.
Think about how spaCy finds the base form of words using its models.
You got /4 concepts.
    Write a short Python code example using spaCy to lemmatize a sentence and print the lemmas.
    Use nlp() to process text and a list comprehension to get lemmas.
    You got /5 concepts.