Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is lemmatization in natural language processing?
Lemmatization is the process of converting a word to its base or dictionary form, called a lemma. For example, 'running' becomes 'run'. It helps in understanding the meaning of words by grouping different forms of the same word.
Click to reveal answer
intermediate
How does spaCy perform lemmatization?
spaCy uses a built-in language model that includes rules and lookup tables to find the lemma of a word based on its context and part of speech. This helps spaCy return the correct base form of words during text processing.
Click to reveal answer
beginner
Which spaCy attribute gives the lemma of a token?
The attribute is token.lemma_. It returns the lemma as a string for each token in the processed text.
Click to reveal answer
intermediate
Why is lemmatization better than simple stemming?
Lemmatization returns real dictionary words as base forms, considering context and part of speech, while stemming just cuts word endings and may produce non-words. Lemmatization gives more accurate and meaningful results.
Click to reveal answer
beginner
Show a simple Python code snippet using spaCy to lemmatize the sentence: 'The cats are running quickly.'
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('The cats are running quickly.')
lemmas = [token.lemma_ for token in doc]
print(lemmas)
This prints: ['the', 'cat', 'be', 'run', 'quickly', '.']
Click to reveal answer
What does the spaCy attribute token.lemma_ return?
AThe word's frequency in the text
BThe part of speech tag
CThe original word text
DThe base form of the word
✗ Incorrect
token.lemma_ returns the lemma, which is the base or dictionary form of the word.
Which of these is a benefit of lemmatization over stemming?
ARemoves stop words automatically
BRuns faster than stemming
CProduces real dictionary words
DIgnores word context
✗ Incorrect
Lemmatization produces real dictionary words by considering context, unlike stemming which may produce non-words.
In spaCy, what must you do before accessing token.lemma_?
ALoad a language model and process text with <code>nlp()</code>
BManually define lemmas for each word
CCall a separate lemmatization function
DNothing, it works on raw text
✗ Incorrect
You need to load a language model like en_core_web_sm and process text with nlp() to get tokens with lemmas.
What is the lemma of the word 'running' in spaCy's default English model?
Aran
Brun
Crunning
Drunner
✗ Incorrect
The lemma of 'running' is 'run', the base form of the verb.
Which spaCy model is commonly used for English lemmatization?
Aen_core_web_sm
Bfr_core_news_sm
Cde_core_news_sm
Dxx_ent_wiki_sm
✗ Incorrect
en_core_web_sm is the small English model that supports lemmatization.
Explain what lemmatization is and how spaCy helps perform it.
Think about how spaCy finds the base form of words using its models.
You got /4 concepts.
Write a short Python code example using spaCy to lemmatize a sentence and print the lemmas.
Use nlp() to process text and a list comprehension to get lemmas.
You got /5 concepts.
Practice
(1/5)
1. What does lemmatization do in natural language processing using spaCy?
easy
A. It removes all punctuation from the text.
B. It counts the number of words in a sentence.
C. It finds the base or dictionary form of a word.
D. It translates text into another language.
Solution
Step 1: Understand the purpose of lemmatization
Lemmatization simplifies words by converting them to their base form, like 'running' to 'run'.
Step 2: Compare options to definition
Only It finds the base or dictionary form of a word. correctly describes finding the base or dictionary form of a word.
Final Answer:
It finds the base or dictionary form of a word. -> Option C
Quick Check:
Lemmatization = base form extraction [OK]
Hint: Lemmatization = find base word form [OK]
Common Mistakes:
Confusing lemmatization with token counting
Thinking it translates text
Mixing it up with punctuation removal
2. Which of the following is the correct way to get the lemma of a token in spaCy?
easy
A. token.lemma_
B. token.lemma
C. token.lemmatize()
D. token.get_lemma()
Solution
Step 1: Recall spaCy token attribute for lemma
spaCy uses the attribute lemma_ (with underscore) to get the lemma as a string.
Step 2: Check each option
token.lemma_ matches the correct attribute. token.lemma, token.lemmatize(), and token.get_lemma() are not valid spaCy syntax.
Final Answer:
token.lemma_ -> Option A
Quick Check:
spaCy lemma attribute = token.lemma_ [OK]
Hint: Use token.lemma_ with underscore for lemma string [OK]
Common Mistakes:
Using token.lemma without underscore
Trying to call a method like lemmatize()
Using non-existent methods like get_lemma()
3. Given the code snippet:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('The cats are running fast')
lemmas = [token.lemma_ for token in doc]
What is the value of lemmas?
medium
A. ['the', 'cats', 'are', 'running', 'fast']
B. ['The', 'cats', 'are', 'running', 'fast']
C. ['The', 'cat', 'is', 'run', 'fast']
D. ['the', 'cat', 'be', 'run', 'fast']
Solution
Step 1: Understand spaCy lemmatization output
spaCy converts words to their base forms: 'cats' to 'cat', 'are' to 'be', 'running' to 'run', and lowercases 'The' to 'the'.
spaCy lemma list = ['the', 'cat', 'be', 'run', 'fast'] [OK]
Hint: Lemmas are base forms, usually lowercase [OK]
Common Mistakes:
Expecting original words instead of lemmas
Not lowercasing lemmas
Confusing verb forms like 'are' with 'is'
4. Identify the error in this spaCy lemmatization code:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('She was eating apples')
lemmas = [token.lemma for token in doc]
print(lemmas)
medium
A. Missing parentheses in spacy.load()
B. Using token.lemma instead of token.lemma_
C. Incorrect model name in spacy.load()
D. Missing import for lemmatizer
Solution
Step 1: Check spaCy lemma attribute usage
spaCy tokens have lemma_ (with underscore) for lemma string, not lemma.
Step 2: Identify the error in code
The code uses token.lemma which returns a property object, not the lemma string, causing wrong output.
Final Answer:
Using token.lemma instead of token.lemma_ -> Option B
Quick Check:
Use token.lemma_ for lemma string [OK]
Hint: Remember underscore in token.lemma_ for lemma [OK]
Common Mistakes:
Using token.lemma without underscore
Assuming spacy.load needs parentheses missing
Thinking model name is wrong
5. You want to lemmatize a list of sentences and count how many times the lemma 'run' appears using spaCy. Which code snippet correctly does this?
hard
A. import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.lemma_ == 'run' for token in doc)
print(count)
B. import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.text == 'run' for token in doc)
print(count)
C. import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.lemma == 'run' for token in doc)
print(count)
D. import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.lemma_ == 'running' for token in doc)
print(count)
Solution
Step 1: Understand the goal and spaCy usage
We want to count all tokens whose lemma is 'run', so we must use token.lemma_ and compare to 'run'.
Step 2: Analyze each option
import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.lemma_ == 'run' for token in doc)
print(count) correctly uses token.lemma_ == 'run'. import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.text == 'run' for token in doc)
print(count) compares original text, missing 'running'. import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.lemma == 'run' for token in doc)
print(count) uses token.lemma without underscore, which is incorrect. import spacy
nlp = spacy.load('en_core_web_sm')
sentences = ['I run daily', 'He is running fast']
count = 0
for sent in sentences:
doc = nlp(sent)
count += sum(token.lemma_ == 'running' for token in doc)
print(count) compares lemma to 'running', which is not the base form.
Final Answer:
sum(token.lemma_ == 'run' for token in doc) -> Option A
Quick Check:
Count lemma 'run' using token.lemma_ == 'run' [OK]
Hint: Compare token.lemma_ to base word for counting [OK]