Bird
Raised Fist0
NlpProgramBeginner · 2 min read

NLP Program to Perform Named Entity Recognition (NER)

Use spaCy's pre-trained model with import spacy; nlp = spacy.load('en_core_web_sm'); doc = nlp(text); entities = [(ent.text, ent.label_) for ent in doc.ents] to perform Named Entity Recognition (NER) on any text.
📋

Examples

InputApple is looking at buying U.K. startup for $1 billion
Output[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
InputBarack Obama was born in Hawaii.
Output[('Barack Obama', 'PERSON'), ('Hawaii', 'GPE')]
InputNo entities here.
Output[]
🧠

How to Think About It

To do Named Entity Recognition, first load a language model trained to recognize entities. Then, pass the input text to this model to get a processed document. Extract the entities by looking at the parts of the text the model marks as names of people, places, organizations, or other categories.
📐

Algorithm

1
Load a pre-trained NLP model that supports NER.
2
Input the text to the model to create a processed document.
3
Extract entities from the document by accessing its entity annotations.
4
Return the list of entities with their text and labels.
💻

Code

python
import spacy

nlp = spacy.load('en_core_web_sm')
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
entities = [(ent.text, ent.label_) for ent in doc.ents]
print(entities)
Output
[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]
🔍

Dry Run

Let's trace the example 'Apple is looking at buying U.K. startup for $1 billion' through the code

1

Load model

Load spaCy's 'en_core_web_sm' model into variable nlp

2

Process text

Pass text 'Apple is looking at buying U.K. startup for $1 billion' to nlp to get doc

3

Extract entities

Look at doc.ents and get [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

4

Print result

Output the list of entities

Entity TextEntity Label
AppleORG
U.K.GPE
$1 billionMONEY
💡

Why This Works

Step 1: Load pre-trained model

The spacy.load function loads a model trained on lots of text to recognize entities.

Step 2: Process input text

Passing text to nlp creates a document object with linguistic annotations.

Step 3: Extract entities

The document's ents property holds all recognized named entities with their labels.

🔄

Alternative Approaches

Using NLTK with a pre-trained classifier
python
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

text = "Barack Obama was born in Hawaii."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
chunks = ne_chunk(pos_tags)
print(chunks)
NLTK's NER is less accurate and requires more setup than spaCy but is useful for educational purposes.
Using Hugging Face transformers with a fine-tuned model
python
from transformers import pipeline

ner = pipeline('ner', grouped_entities=True)
text = "Apple is looking at buying U.K. startup for $1 billion"
result = ner(text)
print(result)
Transformers provide state-of-the-art accuracy but need more resources and setup.

Complexity: O(n) time, O(n) space

Time Complexity

The model processes each word once, so time grows linearly with text length.

Space Complexity

The space needed grows with the number of tokens and entities stored.

Which Approach is Fastest?

spaCy is faster and easier to use than NLTK; transformers are more accurate but slower.

ApproachTimeSpaceBest For
spaCyO(n)O(n)Fast, accurate NER with minimal setup
NLTKO(n)O(n)Educational use, simpler setups
TransformersO(n)O(n)High accuracy, resource-intensive
💡
Always use a pre-trained model like spaCy's 'en_core_web_sm' for quick and accurate NER without training.
⚠️
Beginners often forget to load the correct language model or try to extract entities before processing the text.