NlpProgramBeginner · 2 min read

NLP Program to Perform Named Entity Recognition (NER)

Use spaCy's pre-trained model with

import spacy; nlp = spacy.load('en_core_web_sm'); doc = nlp(text); entities = [(ent.text, ent.label_) for ent in doc.ents]

to perform Named Entity Recognition (NER) on any text.

📋

Examples

InputApple is looking at buying U.K. startup for $1 billion

Output[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

InputBarack Obama was born in Hawaii.

Output[('Barack Obama', 'PERSON'), ('Hawaii', 'GPE')]

InputNo entities here.

Output[]

🧠

How to Think About It

To do Named Entity Recognition, first load a language model trained to recognize entities. Then, pass the input text to this model to get a processed document. Extract the entities by looking at the parts of the text the model marks as names of people, places, organizations, or other categories.

📐

Algorithm

Load a pre-trained NLP model that supports NER.

Input the text to the model to create a processed document.

Extract entities from the document by accessing its entity annotations.

Return the list of entities with their text and labels.

💻

Code

python

import spacy

nlp = spacy.load('en_core_web_sm')
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
entities = [(ent.text, ent.label_) for ent in doc.ents]
print(entities)

Output

[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

🔍

Dry Run

Let's trace the example 'Apple is looking at buying U.K. startup for $1 billion' through the code

Load model

Load spaCy's 'en_core_web_sm' model into variable nlp

Process text

Pass text 'Apple is looking at buying U.K. startup for $1 billion' to nlp to get doc

Extract entities

Look at doc.ents and get [('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]

Print result

Output the list of entities

Entity Text	Entity Label
Apple	ORG
U.K.	GPE
$1 billion	MONEY

💡

Why This Works

Step 1: Load pre-trained model

The spacy.load function loads a model trained on lots of text to recognize entities.

Step 2: Process input text

Passing text to nlp creates a document object with linguistic annotations.

Step 3: Extract entities

The document's ents property holds all recognized named entities with their labels.

🔄

Alternative Approaches

Using NLTK with a pre-trained classifier

python

import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

text = "Barack Obama was born in Hawaii."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
chunks = ne_chunk(pos_tags)
print(chunks)

NLTK's NER is less accurate and requires more setup than spaCy but is useful for educational purposes.

Using Hugging Face transformers with a fine-tuned model

python

from transformers import pipeline

ner = pipeline('ner', grouped_entities=True)
text = "Apple is looking at buying U.K. startup for $1 billion"
result = ner(text)
print(result)

Transformers provide state-of-the-art accuracy but need more resources and setup.

⚡

Complexity: O(n) time, O(n) space

Time Complexity

The model processes each word once, so time grows linearly with text length.

Space Complexity

The space needed grows with the number of tokens and entities stored.

Which Approach is Fastest?

spaCy is faster and easier to use than NLTK; transformers are more accurate but slower.

Approach	Time	Space	Best For
spaCy	O(n)	O(n)	Fast, accurate NER with minimal setup
NLTK	O(n)	O(n)	Educational use, simpler setups
Transformers	O(n)	O(n)	High accuracy, resource-intensive

💡

Always use a pre-trained model like spaCy's 'en_core_web_sm' for quick and accurate NER without training.

⚠️

Beginners often forget to load the correct language model or try to extract entities before processing the text.