Bird
Raised Fist0
NlpHow-ToBeginner · 3 min read

How to Do Named Entity Recognition (NER) Using spaCy in NLP

To do Named Entity Recognition (NER) using spaCy, load a pre-trained language model like en_core_web_sm, process your text with it, and then extract entities from the doc.ents attribute. Each entity has a text and a label describing its type, such as person or organization.
📐

Syntax

To perform NER with spaCy, you first load a language model, then pass your text to the model to create a Doc object. You access recognized entities via doc.ents, where each entity has .text and .label_ properties.

  • spacy.load(): Loads the language model.
  • nlp(text): Processes the text and returns a Doc object.
  • doc.ents: List of entities found in the text.
  • ent.text: The entity text.
  • ent.label_: The entity type label as a string.
python
import spacy

# Load the English model
nlp = spacy.load('en_core_web_sm')

# Process text
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Extract entities
for ent in doc.ents:
    print(ent.text, ent.label_)
Output
Apple ORG U.K. GPE $1 billion MONEY
💻

Example

This example shows how to load spaCy's English model, process a sentence, and print all detected named entities with their labels.

python
import spacy

# Load the small English model
nlp = spacy.load('en_core_web_sm')

# Sample text
text = "Barack Obama was born in Hawaii. He was elected president in 2008."

# Process the text
doc = nlp(text)

# Print entities and their labels
for ent in doc.ents:
    print(f"Entity: {ent.text}, Label: {ent.label_}")
Output
Entity: Barack Obama, Label: PERSON Entity: Hawaii, Label: GPE Entity: 2008, Label: DATE
⚠️

Common Pitfalls

Common mistakes when doing NER with spaCy include:

  • Not loading a model before processing text, which causes errors.
  • Using a blank model without NER capabilities, so no entities are detected.
  • Confusing ent.label_ (string label) with ent.label (integer ID).
  • Expecting perfect entity recognition; spaCy models may miss or mislabel entities.

Always check if the model supports NER and use the correct attributes to access entity information.

python
import spacy

# Wrong: loading blank model without NER
nlp_blank = spacy.blank('en')
doc_blank = nlp_blank('Google is a company.')
print('Entities found:', list(doc_blank.ents))  # Outputs empty list

# Right: load full model with NER
nlp = spacy.load('en_core_web_sm')
doc = nlp('Google is a company.')
print('Entities found:', [(ent.text, ent.label_) for ent in doc.ents])
Output
Entities found: [] Entities found: [('Google', 'ORG')]
📊

Quick Reference

StepDescription
Load modelUse spacy.load('en_core_web_sm') to load English model with NER
Process textCall nlp(text) to create a Doc object
Access entitiesUse doc.ents to get list of entities
Entity textUse ent.text to get the entity string
Entity labelUse ent.label_ to get the entity type as string

Key Takeaways

Load a spaCy model with NER support before processing text.
Use doc.ents to access recognized named entities in the text.
Each entity has a text and a label describing its type.
Avoid using blank models without NER as they find no entities.
Entity recognition is not perfect; verify results for critical use.