Bird
Raised Fist0
NlpHow-ToBeginner · 4 min read

How to Train Custom NER Model with spaCy in NLP

To train a custom Named Entity Recognition (NER) model in spaCy, prepare your labeled training data with entities, create or update a blank or existing nlp pipeline, add the ner component, and train the model by looping over your data with optimizer updates. Finally, save and test your trained model for predictions.
📐

Syntax

The main steps to train a custom NER model in spaCy include:

  • nlp = spacy.blank('en'): Create a blank English model or load an existing one.
  • ner = nlp.add_pipe('ner'): Add the NER component to the pipeline.
  • ner.add_label('LABEL'): Add your custom entity labels.
  • optimizer = nlp.begin_training(): Initialize the optimizer for training.
  • nlp.update(docs, losses=losses, drop=0.5, sgd=optimizer): Train the model by updating it with your training examples.
  • nlp.to_disk('model_path'): Save the trained model to disk.
python
import spacy

# Create blank English model
nlp = spacy.blank('en')

# Add NER component
ner = nlp.add_pipe('ner')

# Add custom labels
ner.add_label('ORG')

# Initialize optimizer
optimizer = nlp.begin_training()

# Example training loop (simplified)
for itn in range(10):
    losses = {}
    for text, annotations in TRAIN_DATA:
        nlp.update([text], [annotations], drop=0.5, sgd=optimizer, losses=losses)
    print(losses)

# Save model
nlp.to_disk('custom_ner_model')
💻

Example

This example shows how to train a custom NER model with spaCy on a small dataset with one label ANIMAL. It trains the model for 20 iterations and tests it on a sample sentence.

python
import spacy
from spacy.training.example import Example

# Training data: text and entities with start/end positions and label
TRAIN_DATA = [
    ("I have a dog", {"entities": [(7, 10, "ANIMAL")] }),
    ("She owns a cat", {"entities": [(10, 13, "ANIMAL")] }),
    ("They saw a rabbit", {"entities": [(10, 16, "ANIMAL")] })
]

# Create blank English model
nlp = spacy.blank("en")

# Add NER pipe
ner = nlp.add_pipe("ner")

# Add label
ner.add_label("ANIMAL")

# Disable other pipes during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes):
    optimizer = nlp.begin_training()
    for i in range(20):
        losses = {}
        for text, annotations in TRAIN_DATA:
            doc = nlp.make_doc(text)
            example = Example.from_dict(doc, annotations)
            nlp.update([example], drop=0.35, sgd=optimizer, losses=losses)
        print(f"Iteration {i+1}, Losses: {losses}")

# Test the trained model
test_text = "My neighbor has a dog and a cat"
doc = nlp(test_text)
print("Entities in '%s':" % test_text)
for ent in doc.ents:
    print(ent.text, ent.label_)
Output
Iteration 1, Losses: {'ner': 5.123456} Iteration 2, Losses: {'ner': 3.987654} ... Iteration 20, Losses: {'ner': 0.123456} Entities in 'My neighbor has a dog and a cat': dog ANIMAL cat ANIMAL
⚠️

Common Pitfalls

Common mistakes when training custom NER in spaCy include:

  • Not adding new labels to the ner component before training.
  • Updating the model without disabling other pipeline components, which can cause errors.
  • Incorrectly formatting training data; entity offsets must be exact character positions.
  • Training for too few iterations or with too small a dataset, leading to poor results.
  • Not saving the model after training, losing the trained weights.
python
import spacy

# Wrong: Not adding label before training
nlp = spacy.blank('en')
ner = nlp.add_pipe('ner')
# ner.add_label('ANIMAL')  # Missing label addition

# This will cause errors or no learning

# Right way:
ner.add_label('ANIMAL')

# Also disable other pipes during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes):
    optimizer = nlp.begin_training()
    # training code here
📊

Quick Reference

Tips for training custom NER with spaCy:

  • Always prepare training data as tuples of (text, {"entities": [(start, end, label)]}).
  • Add all new entity labels to the ner pipe before training.
  • Use nlp.disable_pipes() to disable other components during training for efficiency.
  • Train for multiple iterations and monitor loss to ensure learning.
  • Save your model with nlp.to_disk() and load it later with spacy.load().

Key Takeaways

Prepare training data with exact entity character offsets and labels.
Add custom entity labels to the NER component before training.
Disable other pipeline components during training for better performance.
Train the model over multiple iterations and monitor loss values.
Save and load your trained model to reuse it for predictions.