Bird
Raised Fist0
NLPml~10 mins

Lemmatization in spaCy in NLP - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load the English model in spaCy.

NLP
import spacy
nlp = spacy.[1]("en_core_web_sm")
Drag options to blanks, or click blank then click option'
Aload
Bparse
Cinstall
Ddownload
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'download' instead of 'load' to get the model.
Trying to 'install' the model inside the code instead of loading it.
2fill in blank
medium

Complete the code to process the text with the loaded spaCy model.

NLP
doc = nlp([1])
Drag options to blanks, or click blank then click option'
Adoc
Bnlp
Cspacy
D"This is a test sentence."
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the model object 'nlp' instead of a text string.
Passing an unprocessed variable instead of a string.
3fill in blank
hard

Fix the error in the code to get the lemma of each token.

NLP
lemmas = [token.[1] for token in doc]
Drag options to blanks, or click blank then click option'
Alemmatize
Blemma_
Clemma
Dtext
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'lemma' without underscore which returns a token attribute object.
Trying to call a method 'lemmatize' which does not exist.
4fill in blank
hard

Fill both blanks to create a dictionary of tokens and their lemmas.

NLP
lemma_dict = [1]([2]: token.[3] for token in doc)
Drag options to blanks, or click blank then click option'
Adict
Blist
Clemma_
Dset
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'list' or 'set' instead of 'dict' to create the dictionary.
Using 'lemma' without underscore for the lemma attribute.
5fill in blank
hard

Fill all three blanks to print each token and its lemma in a loop.

NLP
for token in doc:
    print(token.[1], token.[2], token.[3])
Drag options to blanks, or click blank then click option'
Atext
Blemma_
Cpos_
Dtag_
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'lemma' or 'pos' without underscore which are not strings.
Mixing up 'tag_' and 'pos_' which represent different tag types.

Practice

(1/5)
1. What does lemmatization do in natural language processing using spaCy?
easy
A. It removes all punctuation from the text.
B. It counts the number of words in a sentence.
C. It finds the base or dictionary form of a word.
D. It translates text into another language.

Solution

  1. Step 1: Understand the purpose of lemmatization

    Lemmatization simplifies words by converting them to their base form, like 'running' to 'run'.
  2. Step 2: Compare options to definition

    Only It finds the base or dictionary form of a word. correctly describes finding the base or dictionary form of a word.
  3. Final Answer:

    It finds the base or dictionary form of a word. -> Option C
  4. Quick Check:

    Lemmatization = base form extraction [OK]
Hint: Lemmatization = find base word form [OK]
Common Mistakes:
  • Confusing lemmatization with token counting
  • Thinking it translates text
  • Mixing it up with punctuation removal
2. Which of the following is the correct way to get the lemma of a token in spaCy?
easy
A. token.lemma_
B. token.lemma
C. token.lemmatize()
D. token.get_lemma()

Solution

  1. Step 1: Recall spaCy token attribute for lemma

    spaCy uses the attribute lemma_ (with underscore) to get the lemma as a string.
  2. Step 2: Check each option

    token.lemma_ matches the correct attribute. token.lemma, token.lemmatize(), and token.get_lemma() are not valid spaCy syntax.
  3. Final Answer:

    token.lemma_ -> Option A
  4. Quick Check:

    spaCy lemma attribute = token.lemma_ [OK]
Hint: Use token.lemma_ with underscore for lemma string [OK]
Common Mistakes:
  • Using token.lemma without underscore
  • Trying to call a method like lemmatize()
  • Using non-existent methods like get_lemma()
3. Given the code snippet:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('The cats are running fast')
lemmas = [token.lemma_ for token in doc]

What is the value of lemmas?
medium
A. ['the', 'cats', 'are', 'running', 'fast']
B. ['The', 'cats', 'are', 'running', 'fast']
C. ['The', 'cat', 'is', 'run', 'fast']
D. ['the', 'cat', 'be', 'run', 'fast']

Solution

  1. Step 1: Understand spaCy lemmatization output

    spaCy converts words to their base forms: 'cats' to 'cat', 'are' to 'be', 'running' to 'run', and lowercases 'The' to 'the'.
  2. Step 2: Match the list of lemmas

    ['the', 'cat', 'be', 'run', 'fast'] matches the expected lemmas: ['the', 'cat', 'be', 'run', 'fast'].
  3. Final Answer:

    ['the', 'cat', 'be', 'run', 'fast'] -> Option D
  4. Quick Check:

    spaCy lemma list = ['the', 'cat', 'be', 'run', 'fast'] [OK]
Hint: Lemmas are base forms, usually lowercase [OK]
Common Mistakes:
  • Expecting original words instead of lemmas
  • Not lowercasing lemmas
  • Confusing verb forms like 'are' with 'is'
4. Identify the error in this spaCy lemmatization code:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('She was eating apples')
lemmas = [token.lemma for token in doc]
print(lemmas)
medium
A. Missing parentheses in spacy.load()
B. Using token.lemma instead of token.lemma_
C. Incorrect model name in spacy.load()
D. Missing import for lemmatizer

Solution

  1. Step 1: Check spaCy lemma attribute usage

    spaCy tokens have lemma_ (with underscore) for lemma string, not lemma.
  2. Step 2: Identify the error in code

    The code uses token.lemma which returns a property object, not the lemma string, causing wrong output.
  3. Final Answer:

    Using token.lemma instead of token.lemma_ -> Option B
  4. Quick Check:

    Use token.lemma_ for lemma string [OK]
Hint: Remember underscore in token.lemma_ for lemma [OK]
Common Mistakes:
  • Using token.lemma without underscore
  • Assuming spacy.load needs parentheses missing
  • Thinking model name is wrong
5. You want to lemmatize a list of sentences and count how many times the lemma 'run' appears using spaCy. Which code snippet correctly does this?
hard
A. import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.lemma_ == 'run' for token in doc) print(count)
B. import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.text == 'run' for token in doc) print(count)
C. import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.lemma == 'run' for token in doc) print(count)
D. import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.lemma_ == 'running' for token in doc) print(count)

Solution

  1. Step 1: Understand the goal and spaCy usage

    We want to count all tokens whose lemma is 'run', so we must use token.lemma_ and compare to 'run'.
  2. Step 2: Analyze each option

    import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.lemma_ == 'run' for token in doc) print(count) correctly uses token.lemma_ == 'run'. import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.text == 'run' for token in doc) print(count) compares original text, missing 'running'. import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.lemma == 'run' for token in doc) print(count) uses token.lemma without underscore, which is incorrect. import spacy nlp = spacy.load('en_core_web_sm') sentences = ['I run daily', 'He is running fast'] count = 0 for sent in sentences: doc = nlp(sent) count += sum(token.lemma_ == 'running' for token in doc) print(count) compares lemma to 'running', which is not the base form.
  3. Final Answer:

    sum(token.lemma_ == 'run' for token in doc) -> Option A
  4. Quick Check:

    Count lemma 'run' using token.lemma_ == 'run' [OK]
Hint: Compare token.lemma_ to base word for counting [OK]
Common Mistakes:
  • Comparing token.text instead of token.lemma_
  • Using token.lemma without underscore
  • Comparing lemma to non-base form like 'running'