Challenge - 5 Problems
Custom Pipeline Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of a simple custom pipeline component
What will be the output of the following code that adds a custom component to a spaCy pipeline which counts tokens?
NLP
import spacy from spacy.language import Language @Language.component('token_counter') def token_counter(doc): doc._.token_count = len(doc) return doc nlp = spacy.blank('en') # Register extension attribute from spacy.tokens import Doc Doc.set_extension('token_count', default=0) nlp.add_pipe('token_counter') doc = nlp('Hello world! This is a test.') print(doc._.token_count)
Attempts:
2 left
💡 Hint
Count the number of tokens in the sentence including punctuation.
✗ Incorrect
The sentence 'Hello world! This is a test.' has 8 tokens: 'Hello', 'world', '!', 'This', 'is', 'a', 'test', '.'. The code counts len(doc) which is 8 tokens, so the correct output is 8.
❓ Model Choice
intermediate2:00remaining
Choosing the right custom pipeline component for sentiment analysis
You want to add a custom pipeline component that assigns a sentiment score to each document. Which component design is best?
Attempts:
2 left
💡 Hint
Remember that pipeline components must return a Doc object.
✗ Incorrect
Custom components should add information to the Doc without breaking the pipeline. Returning None or replacing the Doc with a string breaks the pipeline. Deleting tokens is destructive and not recommended for sentiment scoring.
❓ Hyperparameter
advanced2:00remaining
Setting hyperparameters in a custom spaCy pipeline component
You want to create a custom pipeline component that filters tokens by a minimum length parameter. How should you pass this parameter to the component?
Attempts:
2 left
💡 Hint
Think about how spaCy components are registered and initialized.
✗ Incorrect
spaCy components can be created as factory functions that accept parameters and return the component function. This allows flexible configuration when adding the component to the pipeline.
🔧 Debug
advanced2:00remaining
Debugging a custom pipeline component that raises an error
Consider this custom component code snippet:
@Language.component('uppercase_tokens')
def uppercase_tokens(doc):
for token in doc:
token.text = token.text.upper()
return doc
Why does this code raise an error when added to the pipeline?
Attempts:
2 left
💡 Hint
Check if token.text can be changed directly.
✗ Incorrect
In spaCy, token.text is read-only. You cannot modify it directly. To change token text, you must create a new Doc or use other methods.
🧠 Conceptual
expert3:00remaining
Understanding the order of custom pipeline components
You have two custom components: one that lemmatizes tokens and another that filters out stop words. Which order should you add them to the pipeline for best results?
Attempts:
2 left
💡 Hint
Think about how lemmatization affects token forms before filtering.
✗ Incorrect
Lemmatization should happen before filtering stop words because filtering usually depends on the lemma or normalized form. Filtering first might remove tokens that could be lemmatized differently.