NLP Program to Perform Named Entity Recognition (NER)
import spacy; nlp = spacy.load('en_core_web_sm'); doc = nlp(text); entities = [(ent.text, ent.label_) for ent in doc.ents] to perform Named Entity Recognition (NER) on any text.Examples
How to Think About It
Algorithm
Code
import spacy nlp = spacy.load('en_core_web_sm') text = "Apple is looking at buying U.K. startup for $1 billion" doc = nlp(text) entities = [(ent.text, ent.label_) for ent in doc.ents] print(entities)
Dry Run
Let's trace the example 'Apple is looking at buying U.K. startup for $1 billion' through the code
Load model
Load spaCy's 'en_core_web_sm' model into variable nlp
Process text
Pass text 'Apple is looking at buying U.K. startup for $1 billion' to nlp to get doc
Extract entities
Look at doc.ents and get [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
Print result
Output the list of entities
| Entity Text | Entity Label |
|---|---|
| Apple | ORG |
| U.K. | GPE |
| $1 billion | MONEY |
Why This Works
Step 1: Load pre-trained model
The spacy.load function loads a model trained on lots of text to recognize entities.
Step 2: Process input text
Passing text to nlp creates a document object with linguistic annotations.
Step 3: Extract entities
The document's ents property holds all recognized named entities with their labels.
Alternative Approaches
import nltk from nltk import word_tokenize, pos_tag, ne_chunk text = "Barack Obama was born in Hawaii." tokens = word_tokenize(text) pos_tags = pos_tag(tokens) chunks = ne_chunk(pos_tags) print(chunks)
from transformers import pipeline ner = pipeline('ner', grouped_entities=True) text = "Apple is looking at buying U.K. startup for $1 billion" result = ner(text) print(result)
Complexity: O(n) time, O(n) space
Time Complexity
The model processes each word once, so time grows linearly with text length.
Space Complexity
The space needed grows with the number of tokens and entities stored.
Which Approach is Fastest?
spaCy is faster and easier to use than NLTK; transformers are more accurate but slower.
| Approach | Time | Space | Best For |
|---|---|---|---|
| spaCy | O(n) | O(n) | Fast, accurate NER with minimal setup |
| NLTK | O(n) | O(n) | Educational use, simpler setups |
| Transformers | O(n) | O(n) | High accuracy, resource-intensive |
