0
0
ML Pythonml~5 mins

Named Entity Recognition basics in ML Python

Choose your learning style9 modes available
Introduction
Named Entity Recognition helps computers find and label important things like names, places, and dates in text. This makes it easier to understand and organize information.
When you want to find all the people mentioned in a news article.
When you need to extract company names from customer reviews.
When you want to identify locations in travel blogs automatically.
When you want to organize emails by detecting dates and events.
When you want to help a chatbot understand user questions better by recognizing key terms.
Syntax
ML Python
model = SomeNERModel()
predictions = model.predict(text)
NER models take text as input and output labels for each word or phrase.
Labels usually include categories like PERSON, LOCATION, ORGANIZATION, and DATE.
Examples
The model finds names, places, and dates in the sentence.
ML Python
text = "Alice went to Paris in April."
predictions = model.predict(text)
# Output: [('Alice', 'PERSON'), ('Paris', 'LOCATION'), ('April', 'DATE')]
The model detects company and person names.
ML Python
text = "Google was founded by Larry Page and Sergey Brin."
predictions = model.predict(text)
# Output: [('Google', 'ORGANIZATION'), ('Larry Page', 'PERSON'), ('Sergey Brin', 'PERSON')]
Sample Model
This example trains a simple Named Entity Recognition model using a CRF on a tiny dataset. It then predicts entity labels for a new sentence.
ML Python
from sklearn_crfsuite import CRF

# Sample training data: words and their entity labels
train_sents = [[('John', 'B-PER'), ('lives', 'O'), ('in', 'O'), ('New', 'B-LOC'), ('York', 'I-LOC')]]

# Feature extractor for each word
def word2features(sent, i):
    word = sent[i][0]
    features = {
        'word.lower()': word.lower(),
        'word.isupper()': word.isupper(),
        'word.istitle()': word.istitle(),
        'word.isdigit()': word.isdigit(),
    }
    return features

def sent2features(sent):
    return [word2features(sent, i) for i in range(len(sent))]

def sent2labels(sent):
    return [label for token, label in sent]

X_train = [sent2features(s) for s in train_sents]
y_train = [sent2labels(s) for s in train_sents]

# Train CRF model
crf = CRF(algorithm='lbfgs', max_iterations=100)
crf.fit(X_train, y_train)

# Test sentence
test_sent = [('Mary', 'O'), ('moved', 'O'), ('to', 'O'), ('Los', 'O'), ('Angeles', 'O')]
X_test = [word2features(test_sent, i) for i in range(len(test_sent))]

# Predict entity labels
predicted = crf.predict([X_test])[0]

# Print results
for (word, _), label in zip(test_sent, predicted):
    print(f"{word}: {label}")
OutputSuccess
Important Notes
NER models often use labels like B- (beginning), I- (inside), and O (outside) to mark entity spans.
Real NER models need much more data to work well.
Pretrained models like spaCy or Hugging Face transformers can do NER without training.
Summary
Named Entity Recognition finds and labels important words like names and places in text.
NER uses special labels to mark where entities start and continue.
Simple models can be trained on small data, but pretrained models work best for real tasks.