0
0
NLPml~5 mins

NER with NLTK in NLP

Choose your learning style9 modes available
Introduction

NER helps find names of people, places, and things in text automatically. It makes reading and understanding text easier for computers.

You want to find names of people mentioned in news articles.
You need to extract locations from travel blogs.
You want to identify organizations in business reports.
You want to highlight important words in emails automatically.
Syntax
NLP
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

text = "Your text here"
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
ner_tree = ne_chunk(pos_tags)

print(ner_tree)

Use word_tokenize to split text into words.

pos_tag adds part-of-speech tags needed for NER.

Examples
This example finds the person and location names in a simple sentence.
NLP
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

text = "Barack Obama was born in Hawaii."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
ner_tree = ne_chunk(pos_tags)

print(ner_tree)
This example detects organizations and locations in a business sentence.
NLP
text = "Apple is looking at buying U.K. startup for $1 billion"
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
ner_tree = ne_chunk(pos_tags)

print(ner_tree)
Sample Model

This program finds named entities like people and places in the sentence and prints their type.

NLP
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

# Download required NLTK data files
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

text = "Mark Zuckerberg founded Facebook in California."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
ner_tree = ne_chunk(pos_tags)

print("Named Entities:")
for subtree in ner_tree:
    if hasattr(subtree, 'label'):
        entity_name = ' '.join(c[0] for c in subtree)
        entity_type = subtree.label()
        print(f"{entity_name}: {entity_type}")
OutputSuccess
Important Notes

NLTK's NER uses a pre-trained model that works well on general English text.

NER results are trees; you can extract entities by checking for labels.

Make sure to download required NLTK data before running NER.

Summary

NER finds names of people, places, and organizations in text.

NLTK provides easy tools to tokenize, tag, and recognize entities.

Use ne_chunk on POS-tagged tokens to get named entities.