Recall & Review
beginner
What is lemmatization in natural language processing?
Lemmatization is the process of converting a word to its base or dictionary form, called a lemma. For example, 'running' becomes 'run'. It helps in understanding the meaning of words by grouping different forms of the same word.
Click to reveal answer
intermediate
How does spaCy perform lemmatization?
spaCy uses a built-in language model that includes rules and lookup tables to find the lemma of a word based on its context and part of speech. This helps spaCy return the correct base form of words during text processing.
Click to reveal answer
beginner
Which spaCy attribute gives the lemma of a token?
The attribute is
token.lemma_. It returns the lemma as a string for each token in the processed text.Click to reveal answer
intermediate
Why is lemmatization better than simple stemming?
Lemmatization returns real dictionary words as base forms, considering context and part of speech, while stemming just cuts word endings and may produce non-words. Lemmatization gives more accurate and meaningful results.
Click to reveal answer
beginner
Show a simple Python code snippet using spaCy to lemmatize the sentence: 'The cats are running quickly.'
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('The cats are running quickly.')
lemmas = [token.lemma_ for token in doc]
print(lemmas)
This prints: ['the', 'cat', 'be', 'run', 'quickly', '.']Click to reveal answer
What does the spaCy attribute
token.lemma_ return?✗ Incorrect
token.lemma_ returns the lemma, which is the base or dictionary form of the word.
Which of these is a benefit of lemmatization over stemming?
✗ Incorrect
Lemmatization produces real dictionary words by considering context, unlike stemming which may produce non-words.
In spaCy, what must you do before accessing
token.lemma_?✗ Incorrect
You need to load a language model like en_core_web_sm and process text with nlp() to get tokens with lemmas.
What is the lemma of the word 'running' in spaCy's default English model?
✗ Incorrect
The lemma of 'running' is 'run', the base form of the verb.
Which spaCy model is commonly used for English lemmatization?
✗ Incorrect
en_core_web_sm is the small English model that supports lemmatization.
Explain what lemmatization is and how spaCy helps perform it.
Think about how spaCy finds the base form of words using its models.
You got /4 concepts.
Write a short Python code example using spaCy to lemmatize a sentence and print the lemmas.
Use nlp() to process text and a list comprehension to get lemmas.
You got /5 concepts.