How to Do Named Entity Recognition (NER) Using spaCy in NLP
To do
Named Entity Recognition (NER) using spaCy, load a pre-trained language model like en_core_web_sm, process your text with it, and then extract entities from the doc.ents attribute. Each entity has a text and a label describing its type, such as person or organization.Syntax
To perform NER with spaCy, you first load a language model, then pass your text to the model to create a Doc object. You access recognized entities via doc.ents, where each entity has .text and .label_ properties.
spacy.load(): Loads the language model.nlp(text): Processes the text and returns aDocobject.doc.ents: List of entities found in the text.ent.text: The entity text.ent.label_: The entity type label as a string.
python
import spacy # Load the English model nlp = spacy.load('en_core_web_sm') # Process text text = "Apple is looking at buying U.K. startup for $1 billion" doc = nlp(text) # Extract entities for ent in doc.ents: print(ent.text, ent.label_)
Output
Apple ORG
U.K. GPE
$1 billion MONEY
Example
This example shows how to load spaCy's English model, process a sentence, and print all detected named entities with their labels.
python
import spacy # Load the small English model nlp = spacy.load('en_core_web_sm') # Sample text text = "Barack Obama was born in Hawaii. He was elected president in 2008." # Process the text doc = nlp(text) # Print entities and their labels for ent in doc.ents: print(f"Entity: {ent.text}, Label: {ent.label_}")
Output
Entity: Barack Obama, Label: PERSON
Entity: Hawaii, Label: GPE
Entity: 2008, Label: DATE
Common Pitfalls
Common mistakes when doing NER with spaCy include:
- Not loading a model before processing text, which causes errors.
- Using a blank model without NER capabilities, so no entities are detected.
- Confusing
ent.label_(string label) withent.label(integer ID). - Expecting perfect entity recognition; spaCy models may miss or mislabel entities.
Always check if the model supports NER and use the correct attributes to access entity information.
python
import spacy # Wrong: loading blank model without NER nlp_blank = spacy.blank('en') doc_blank = nlp_blank('Google is a company.') print('Entities found:', list(doc_blank.ents)) # Outputs empty list # Right: load full model with NER nlp = spacy.load('en_core_web_sm') doc = nlp('Google is a company.') print('Entities found:', [(ent.text, ent.label_) for ent in doc.ents])
Output
Entities found: []
Entities found: [('Google', 'ORG')]
Quick Reference
| Step | Description |
|---|---|
| Load model | Use spacy.load('en_core_web_sm') to load English model with NER |
| Process text | Call nlp(text) to create a Doc object |
| Access entities | Use doc.ents to get list of entities |
| Entity text | Use ent.text to get the entity string |
| Entity label | Use ent.label_ to get the entity type as string |
Key Takeaways
Load a spaCy model with NER support before processing text.
Use doc.ents to access recognized named entities in the text.
Each entity has a text and a label describing its type.
Avoid using blank models without NER as they find no entities.
Entity recognition is not perfect; verify results for critical use.
