0
0
NLPml~20 mins

Lemmatization in spaCy in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Lemmatization Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this spaCy lemmatization code?
Given the following code snippet using spaCy, what will be the printed list of lemmas?
NLP
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('The striped bats are hanging on their feet for best')
lemmas = [token.lemma_ for token in doc]
print(lemmas)
A['the', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'foot', 'for', 'good']
B['the', 'striped', 'bat', 'be', 'hang', 'on', 'their', 'feet', 'for', 'best']
C['The', 'striped', 'bats', 'are', 'hanging', 'on', 'their', 'feet', 'for', 'best']
D['the', 'striped', 'bat', 'are', 'hang', 'on', 'their', 'feet', 'for', 'best']
Attempts:
2 left
💡 Hint
Look at how spaCy converts plural nouns and verbs to their base forms.
Model Choice
intermediate
1:30remaining
Which spaCy model is best for accurate lemmatization?
You want to perform lemmatization on English text with good accuracy and speed. Which spaCy model should you choose?
Aen_core_web_sm (small model)
Ben_vectors_web_lg (only word vectors, no lemmatization)
Cen_core_web_lg (large model)
Den_core_web_md (medium model)
Attempts:
2 left
💡 Hint
Larger models usually have better linguistic features for tasks like lemmatization.
Metrics
advanced
1:30remaining
Which metric best evaluates lemmatization quality?
You have a dataset with gold-standard lemmas and your spaCy model's predicted lemmas. Which metric best measures lemmatization accuracy?
AExact match accuracy
BRecall
CPrecision
DF1 score
Attempts:
2 left
💡 Hint
Lemmatization is about exact word form matches.
🔧 Debug
advanced
2:00remaining
Why does this spaCy lemmatization code raise an error?
Consider this code snippet: import spacy nlp = spacy.load('en_core_web_sm') text = 'Cats running fast' doc = nlp(text) lemmas = [token.lemma for token in doc] print(lemmas) Why does it raise an AttributeError?
A'doc' object is not iterable error
B'nlp' object is not callable error due to missing parentheses
C'text' variable is not defined before use
D'Token' object has no attribute 'lemma' because the correct attribute is 'lemma_'
Attempts:
2 left
💡 Hint
Check the attribute name for token lemmas in spaCy.
🧠 Conceptual
expert
2:30remaining
Why might spaCy lemmatization keep 'feet' as 'feet' instead of 'foot'?
In spaCy, the word 'feet' is lemmatized as 'feet' instead of the expected singular 'foot'. What is the most likely reason?
AspaCy's lemmatizer uses a dictionary-based approach that sometimes keeps irregular plurals unchanged
BThe lemmatizer relies on part-of-speech tags and 'feet' is tagged as plural noun but lemmatizer lacks irregular plural rules
CThe model's vocabulary does not include 'foot' so it cannot lemmatize 'feet' correctly
DspaCy treats 'feet' as a plural noun but does not normalize irregular plurals to singular
Attempts:
2 left
💡 Hint
Think about how dictionary-based lemmatizers handle irregular forms.