Custom NER training helps a computer find special words in text that matter to you. It learns to spot names, places, or things you care about.
Custom NER training basics in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Syntax
NLP
import spacy from spacy.training.example import Example # Load blank model nlp = spacy.blank('en') # Create NER component ner = nlp.add_pipe('ner') # Add labels ner.add_label('CUSTOM_LABEL') # Prepare training data TRAIN_DATA = [ ("Apple is a company", {"entities": [(0, 5, "CUSTOM_LABEL")]}) ] # Training loop optimizer = nlp.begin_training() for i in range(10): for text, annotations in TRAIN_DATA: doc = nlp.make_doc(text) example = Example.from_dict(doc, annotations) nlp.update([example], sgd=optimizer) # Test doc = nlp("Apple is big") for ent in doc.ents: print(ent.text, ent.label_)
Use add_label to tell the model what new words to learn.
Training data needs text and the positions of special words.
Examples
NLP
ner.add_label('PRODUCT')NLP
TRAIN_DATA = [("I love Tesla cars", {"entities": [(7, 12, "ORG")]})]
NLP
for i in range(5): nlp.update([example], sgd=optimizer)
Sample Model
This program trains a simple model to recognize 'Apple' as a fruit. It shows how to add a label, prepare data, train, and test.
NLP
import spacy from spacy.training.example import Example # Create blank English model nlp = spacy.blank('en') # Add NER pipe ner = nlp.add_pipe('ner') # Add custom label ner.add_label('FRUIT') # Training data with 'Apple' as FRUIT TRAIN_DATA = [ ("I like Apple", {"entities": [(7, 12, "FRUIT")]}) ] # Start training optimizer = nlp.begin_training() # Train for 10 iterations for i in range(10): for text, annotations in TRAIN_DATA: doc = nlp.make_doc(text) example = Example.from_dict(doc, annotations) nlp.update([example], sgd=optimizer) # Test the model doc = nlp("Apple is tasty") for ent in doc.ents: print(ent.text, ent.label_)
Important Notes
Training a custom NER model needs enough examples to learn well.
Positions in entities are start and end character indexes in the text.
Use a blank model to avoid confusion with existing labels.
Summary
Custom NER training teaches a model to find your special words.
You prepare text with labeled parts and train the model in loops.
After training, the model can spot your custom words in new text.
Practice
1. What is the main goal of custom NER training in NLP?
easy
Solution
Step 1: Understand what NER means
NER stands for Named Entity Recognition, which means finding specific words or phrases in text.Step 2: Identify the purpose of custom training
Custom NER training teaches the model to find your special labeled words, not general tasks like translation or summarization.Final Answer:
To teach the model to recognize specific words or phrases you label -> Option BQuick Check:
Custom NER = Recognize labeled words [OK]
Hint: Custom NER means teaching model your special words [OK]
Common Mistakes:
- Confusing NER with translation or summarization
- Thinking NER generates new text
- Assuming NER works without labeled data
2. Which of the following is the correct way to label a sentence for custom NER training in Python spaCy format?
easy
Solution
Step 1: Check the labeling key
spaCy uses the 'entities' key, not 'labels', to hold labeled spans.Step 2: Verify the span and label
Span (0,5) covers 'Apple' correctly, and label 'ORG' (organization) fits. A span like (6,7,'ORG') points to the wrong position, and 'PERSON' is incorrect for a company.Final Answer:
('Apple is a company', {'entities': [(0, 5, 'ORG')]}) -> Option AQuick Check:
Correct key and span = ('Apple is a company', {'entities': [(0, 5, 'ORG')]}) [OK]
Hint: Use 'entities' key with correct span and label [OK]
Common Mistakes:
- Using 'labels' instead of 'entities'
- Incorrect character span for entity
- Wrong entity type label
3. Given this training data snippet for custom NER:
TRAIN_DATA = [
('I love Paris', {'entities': [(7, 12, 'GPE')]})
]
What will the model predict for the sentence 'I love Paris' after training?medium
Solution
Step 1: Understand the labeled entity
The training data labels 'Paris' from character 7 to 12 as 'GPE' (Geopolitical entity).Step 2: Predict model output after training
The model learns to recognize 'Paris' as 'GPE' and should predict [('Paris', 'GPE')] for the same sentence.Final Answer:
[('Paris', 'GPE')] -> Option CQuick Check:
Entity span matches 'Paris' = [('Paris', 'GPE')] [OK]
Hint: Model predicts labeled spans from training data [OK]
Common Mistakes:
- Confusing entity span with other words
- Expecting no entities if training is done
- Mixing entity labels
4. You wrote this code to add a new entity label to your NER model:
ner.add_label('ANIMAL')
But after training, the model never detects 'ANIMAL' entities. What is the most likely mistake?medium
Solution
Step 1: Check the method usage
ner.add_label('ANIMAL') is correct to add a new label. There is no add_entity() method, no need to call remove_label first, and 'ANIMAL' is not reserved.Step 2: Verify training data
Model learns from examples. Without training examples labeled 'ANIMAL', model cannot detect it.Final Answer:
You forgot to include training examples with 'ANIMAL' labels -> Option DQuick Check:
Training data needed for new labels = You forgot to include training examples with 'ANIMAL' labels [OK]
Hint: Add labeled examples for new entity labels [OK]
Common Mistakes:
- Assuming adding label alone trains model
- Using wrong method names
- Thinking labels are reserved keywords
5. You want to train a custom NER model to recognize two new entity types: 'FOOD' and 'DRINK'. You have labeled training data for both. Which of the following is the best approach to ensure the model learns both correctly?
hard
Solution
Step 1: Add all new labels before training
Adding both 'FOOD' and 'DRINK' labels upfront ensures model knows what to learn.Step 2: Provide balanced training data and train iteratively
Balanced examples for both labels and multiple training loops help model learn both well.Final Answer:
Add both labels with ner.add_label(), include balanced training examples for each, and train in multiple iterations -> Option AQuick Check:
All labels + balanced data + training = Add both labels with ner.add_label(), include balanced training examples for each, and train in multiple iterations [OK]
Hint: Add all labels and balanced data before training [OK]
Common Mistakes:
- Adding labels one by one with separate training
- Skipping label addition
- Training with unbalanced or missing examples
