When building custom pipeline components in NLP, the key metrics depend on the task the component performs. For example, if the component classifies text, accuracy, precision, and recall matter to measure how well it predicts correct labels. If it extracts information, metrics like F1 score balance precision and recall to show overall quality. These metrics help us know if the component improves the pipeline or not.
Custom pipeline components in NLP - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP) = 50 | False Negative (FN) = 10 |
| False Positive (FP) = 5 | True Negative (TN) = 35 |
Total samples = 50 + 10 + 5 + 35 = 100
Precision = TP / (TP + FP) = 50 / (50 + 5) = 0.91
Recall = TP / (TP + FN) = 50 / (50 + 10) = 0.83
F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.87
In custom NLP components, sometimes you want to catch as many correct cases as possible (high recall), even if some are wrong. For example, a component detecting sensitive info should find all instances (high recall) to avoid leaks.
Other times, you want to be very sure when the component says "yes" (high precision). For example, a spam detector should not mark good emails as spam, so precision is key.
Balancing precision and recall depends on the use case. The F1 score helps find a good middle ground.
- Good: Precision and recall above 0.8, showing the component finds most correct cases and makes few mistakes.
- Bad: Precision or recall below 0.5, meaning many wrong predictions or many missed cases.
- Accuracy: Can be misleading if classes are imbalanced. For example, 90% accuracy might be bad if the component misses all rare but important cases.
- Accuracy paradox: High accuracy but poor recall on rare classes.
- Data leakage: Training data accidentally includes test info, inflating metrics.
- Overfitting: Great metrics on training data but poor on new data.
- Ignoring class imbalance: Not using precision/recall or F1 when classes are uneven.
Your custom NLP component has 98% accuracy but only 12% recall on the important class. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means it misses most important cases, even though accuracy is high. This can cause serious problems if those cases matter. You should improve recall before using it in production.
Practice
Solution
Step 1: Understand the role of pipeline components
Pipeline components process text step-by-step, modifying or analyzing it.Step 2: Identify what custom components do
Custom components let you add your own processing steps that change the document or add data.Final Answer:
To add your own processing steps that modify the document -> Option DQuick Check:
Custom pipeline components = add processing steps [OK]
- Thinking custom components replace the whole model
- Confusing visualization with processing
- Assuming storage is part of pipeline components
Solution
Step 1: Recall the function signature for custom components
Custom components take adocobject and return it after processing.Step 2: Check each option
def custom_component(doc): return doc matches the signature and returns the doc. Others either take wrong input or don't return doc.Final Answer:
def custom_component(doc): return doc -> Option CQuick Check:
Function takes doc and returns doc [OK]
- Using text instead of doc as input
- Not returning the doc object
- Missing the doc parameter
def add_custom_attr(doc):
for token in doc:
token._.is_custom = token.text.isalpha()
return doc
nlp.add_pipe(add_custom_attr, last=True)
text = 'Hello 123!'
doc = nlp(text)
print([token._.is_custom for token in doc])What will be the printed output?
Solution
Step 1: Analyze the tokens in the text
The text 'Hello 123!' splits into tokens: 'Hello', '123', '!'.Step 2: Check the custom attribute logic
For each token, isalpha() returns True if all characters are letters. 'Hello' is True, '123' and '!' are False.Final Answer:
[True, False, False] -> Option BQuick Check:
isalpha() per token = [True, False, False] [OK]
- Assuming punctuation is alpha
- Counting tokens incorrectly
- Forgetting to return doc
def faulty_component(doc):
for token in doc:
token._.is_custom = token.text.isdigit()
# Missing return statement
nlp.add_pipe(faulty_component, last=True)Solution
Step 1: Check the function structure
The function loops over tokens and sets a custom attribute but does not return the doc.Step 2: Recall pipeline component requirements
Custom components must return the doc object to continue the pipeline correctly.Final Answer:
It does not return the doc object -> Option AQuick Check:
Missing return doc causes pipeline failure [OK]
- Forgetting to return doc
- Using wrong attribute names without registration
- Adding component incorrectly
doc._.uppercase_count. Which of the following is the correct approach?Solution
Step 1: Understand extension registration
To add a new attribute todoc._, you must register a doc extension first.Step 2: Implement counting and assignment
Count uppercase tokens in the component, assign the count todoc._.uppercase_count, then return doc.Final Answer:
Register a doc extension for 'uppercase_count', define a component that counts uppercase tokens, assign the count to doc._.uppercase_count, and return doc -> Option AQuick Check:
Doc extension + count + assign + return doc [OK]
- Not registering the doc extension before use
- Using token extension for doc-level data
- Not returning doc at the end
