Bird
Raised Fist0
NLPml~20 mins

Custom pipeline components in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Custom Pipeline Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of a simple custom pipeline component
What will be the output of the following code that adds a custom component to a spaCy pipeline which counts tokens?
NLP
import spacy
from spacy.language import Language

@Language.component('token_counter')
def token_counter(doc):
    doc._.token_count = len(doc)
    return doc

nlp = spacy.blank('en')

# Register extension attribute
from spacy.tokens import Doc
Doc.set_extension('token_count', default=0)

nlp.add_pipe('token_counter')
doc = nlp('Hello world! This is a test.')
print(doc._.token_count)
ASyntaxError
B5
C6
D8
Attempts:
2 left
💡 Hint
Count the number of tokens in the sentence including punctuation.
Model Choice
intermediate
2:00remaining
Choosing the right custom pipeline component for sentiment analysis
You want to add a custom pipeline component that assigns a sentiment score to each document. Which component design is best?
AA component that deletes tokens from the Doc to keep only positive words.
BA component that modifies the Doc object by adding a custom attribute with the sentiment score.
CA component that returns None instead of a Doc object.
DA component that replaces the Doc object with a string containing the sentiment score.
Attempts:
2 left
💡 Hint
Remember that pipeline components must return a Doc object.
Hyperparameter
advanced
2:00remaining
Setting hyperparameters in a custom spaCy pipeline component
You want to create a custom pipeline component that filters tokens by a minimum length parameter. How should you pass this parameter to the component?
ADefine the component as a factory function that accepts the parameter and returns the actual component function.
BHardcode the minimum length inside the component function without parameters.
CPass the parameter as a global variable outside the component.
DSet the parameter inside the Doc object before processing.
Attempts:
2 left
💡 Hint
Think about how spaCy components are registered and initialized.
🔧 Debug
advanced
2:00remaining
Debugging a custom pipeline component that raises an error
Consider this custom component code snippet: @Language.component('uppercase_tokens') def uppercase_tokens(doc): for token in doc: token.text = token.text.upper() return doc Why does this code raise an error when added to the pipeline?
ABecause token.text is a read-only property and cannot be assigned to.
BBecause the function does not return a Doc object.
CBecause the pipeline does not support loops over tokens.
DBecause the component name is invalid.
Attempts:
2 left
💡 Hint
Check if token.text can be changed directly.
🧠 Conceptual
expert
3:00remaining
Understanding the order of custom pipeline components
You have two custom components: one that lemmatizes tokens and another that filters out stop words. Which order should you add them to the pipeline for best results?
AOrder does not matter; both can be added in any sequence.
BAdd the stop word filter first, then the lemmatizer.
CAdd the lemmatizer first, then the stop word filter.
DAdd both components simultaneously using add_pipe with the same name.
Attempts:
2 left
💡 Hint
Think about how lemmatization affects token forms before filtering.

Practice

(1/5)
1. What is the main purpose of a custom pipeline component in an NLP pipeline?
easy
A. To store the processed documents in a database
B. To replace the entire NLP model with a new one
C. To visualize the text data in charts
D. To add your own processing steps that modify the document

Solution

  1. Step 1: Understand the role of pipeline components

    Pipeline components process text step-by-step, modifying or analyzing it.
  2. Step 2: Identify what custom components do

    Custom components let you add your own processing steps that change the document or add data.
  3. Final Answer:

    To add your own processing steps that modify the document -> Option D
  4. Quick Check:

    Custom pipeline components = add processing steps [OK]
Hint: Custom components add steps that change the document [OK]
Common Mistakes:
  • Thinking custom components replace the whole model
  • Confusing visualization with processing
  • Assuming storage is part of pipeline components
2. Which of the following is the correct way to define a custom pipeline component function in Python?
easy
A. def custom_component(text): return text
B. def custom_component(doc): print(doc)
C. def custom_component(doc): return doc
D. def custom_component(): return None

Solution

  1. Step 1: Recall the function signature for custom components

    Custom components take a doc object and return it after processing.
  2. Step 2: Check each option

    def custom_component(doc): return doc matches the signature and returns the doc. Others either take wrong input or don't return doc.
  3. Final Answer:

    def custom_component(doc): return doc -> Option C
  4. Quick Check:

    Function takes doc and returns doc [OK]
Hint: Custom component functions take and return doc objects [OK]
Common Mistakes:
  • Using text instead of doc as input
  • Not returning the doc object
  • Missing the doc parameter
3. Given this custom component code:
def add_custom_attr(doc):
    for token in doc:
        token._.is_custom = token.text.isalpha()
    return doc

nlp.add_pipe(add_custom_attr, last=True)

text = 'Hello 123!'
doc = nlp(text)
print([token._.is_custom for token in doc])

What will be the printed output?
medium
A. [True, True, False]
B. [True, False, False]
C. [True, False, True]
D. [False, False, False]

Solution

  1. Step 1: Analyze the tokens in the text

    The text 'Hello 123!' splits into tokens: 'Hello', '123', '!'.
  2. Step 2: Check the custom attribute logic

    For each token, isalpha() returns True if all characters are letters. 'Hello' is True, '123' and '!' are False.
  3. Final Answer:

    [True, False, False] -> Option B
  4. Quick Check:

    isalpha() per token = [True, False, False] [OK]
Hint: Check token text with isalpha() for True/False [OK]
Common Mistakes:
  • Assuming punctuation is alpha
  • Counting tokens incorrectly
  • Forgetting to return doc
4. What is wrong with this custom pipeline component code?
def faulty_component(doc):
    for token in doc:
        token._.is_custom = token.text.isdigit()
    # Missing return statement

nlp.add_pipe(faulty_component, last=True)
medium
A. It does not return the doc object
B. It uses an invalid attribute name
C. It modifies tokens outside the loop
D. It should not be added to the pipeline

Solution

  1. Step 1: Check the function structure

    The function loops over tokens and sets a custom attribute but does not return the doc.
  2. Step 2: Recall pipeline component requirements

    Custom components must return the doc object to continue the pipeline correctly.
  3. Final Answer:

    It does not return the doc object -> Option A
  4. Quick Check:

    Missing return doc causes pipeline failure [OK]
Hint: Always return doc at end of custom component [OK]
Common Mistakes:
  • Forgetting to return doc
  • Using wrong attribute names without registration
  • Adding component incorrectly
5. You want to create a custom pipeline component that counts how many tokens in a document are uppercase and stores this count as doc._.uppercase_count. Which of the following is the correct approach?
hard
A. Register a doc extension for 'uppercase_count', define a component that counts uppercase tokens, assign the count to doc._.uppercase_count, and return doc
B. Add a token extension for 'uppercase_count' and count uppercase tokens per token
C. Modify tokens in place without registering any extension and return doc
D. Create a new NLP model that outputs uppercase counts directly

Solution

  1. Step 1: Understand extension registration

    To add a new attribute to doc._, you must register a doc extension first.
  2. Step 2: Implement counting and assignment

    Count uppercase tokens in the component, assign the count to doc._.uppercase_count, then return doc.
  3. Final Answer:

    Register a doc extension for 'uppercase_count', define a component that counts uppercase tokens, assign the count to doc._.uppercase_count, and return doc -> Option A
  4. Quick Check:

    Doc extension + count + assign + return doc [OK]
Hint: Register doc extension before assigning custom doc attributes [OK]
Common Mistakes:
  • Not registering the doc extension before use
  • Using token extension for doc-level data
  • Not returning doc at the end