Challenge - 5 Problems
Lemmatization Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this spaCy lemmatization code?
Given the following code snippet using spaCy, what will be the printed list of lemmas?
NLP
import spacy nlp = spacy.load('en_core_web_sm') doc = nlp('The striped bats are hanging on their feet for best') lemmas = [token.lemma_ for token in doc] print(lemmas)
Attempts:
2 left
💡 Hint
Look at how spaCy converts plural nouns and verbs to their base forms.
✗ Incorrect
spaCy lemmatizes 'bats' to 'bat', 'are' to 'be', 'hanging' to 'hang', and keeps 'feet' as 'feet' because it's irregular plural. 'best' remains unchanged as it's an adjective.
❓ Model Choice
intermediate1:30remaining
Which spaCy model is best for accurate lemmatization?
You want to perform lemmatization on English text with good accuracy and speed. Which spaCy model should you choose?
Attempts:
2 left
💡 Hint
Larger models usually have better linguistic features for tasks like lemmatization.
✗ Incorrect
The large model (en_core_web_lg) includes more detailed linguistic data and performs better lemmatization than smaller models. The vectors-only model does not provide lemmatization.
❓ Metrics
advanced1:30remaining
Which metric best evaluates lemmatization quality?
You have a dataset with gold-standard lemmas and your spaCy model's predicted lemmas. Which metric best measures lemmatization accuracy?
Attempts:
2 left
💡 Hint
Lemmatization is about exact word form matches.
✗ Incorrect
Exact match accuracy counts how many predicted lemmas exactly match the gold lemmas, which is the most direct measure for lemmatization quality.
🔧 Debug
advanced2:00remaining
Why does this spaCy lemmatization code raise an error?
Consider this code snippet:
import spacy
nlp = spacy.load('en_core_web_sm')
text = 'Cats running fast'
doc = nlp(text)
lemmas = [token.lemma for token in doc]
print(lemmas)
Why does it raise an AttributeError?
Attempts:
2 left
💡 Hint
Check the attribute name for token lemmas in spaCy.
✗ Incorrect
spaCy tokens have the attribute 'lemma_' (with underscore) for the lemma string. Using 'lemma' without underscore causes AttributeError.
🧠 Conceptual
expert2:30remaining
Why might spaCy lemmatization keep 'feet' as 'feet' instead of 'foot'?
In spaCy, the word 'feet' is lemmatized as 'feet' instead of the expected singular 'foot'. What is the most likely reason?
Attempts:
2 left
💡 Hint
Think about how dictionary-based lemmatizers handle irregular forms.
✗ Incorrect
spaCy's lemmatizer uses a dictionary and rules. Sometimes irregular plurals like 'feet' are kept unchanged if the dictionary entry maps 'feet' to 'feet' instead of 'foot'.