For recognizing entity types like PERSON, ORG, LOC, and DATE, Precision and Recall are key. Precision tells us how many identified entities are correct. Recall tells us how many actual entities were found. We want both high because missing entities (low recall) or wrongly labeling text (low precision) hurts understanding.
Entity types (PERSON, ORG, LOC, DATE) in NLP - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Predicted
P O L D None
True P 40 2 1 0 7
O 3 35 2 0 5
L 1 2 38 1 8
D 0 0 1 45 4
None 5 4 6 3 377
This shows how many entities of each true type were predicted as each type or missed (None). For example, 40 PERSON entities were correctly found as PERSON (True Positive for PERSON). 7 PERSON entities were missed (predicted None).
If we want to avoid wrongly tagging words as entities (high precision), we might miss some real entities (lower recall). For example, in legal documents, wrongly tagging a word as a person could cause confusion, so precision is important.
But in news summarization, missing a person or location (low recall) means losing important info, so recall is more important.
Balancing precision and recall depends on the task's goal.
- Good: Precision and Recall above 85% for all entity types means the model finds most entities correctly and rarely makes mistakes.
- Bad: Precision below 60% means many false entities are predicted, confusing users.
- Bad: Recall below 50% means many real entities are missed, losing key information.
- Accuracy paradox: Since most words are not entities, accuracy can be very high even if the model never finds entities.
- Data leakage: If test data contains entities seen in training, metrics may look better than real performance.
- Overfitting: Very high precision but low recall can mean the model only recognizes entities it memorized.
Your model has 98% accuracy but only 12% recall on PERSON entities. Is it good for production?
Answer: No. The high accuracy is misleading because most words are not entities. The very low recall means the model misses almost all PERSON entities, which is bad if you need to find people in text.
Practice
"Albert Einstein" in a text?Solution
Step 1: Understand entity types
PERSON labels identify names of people in text.Step 2: Match the example to entity type
"Albert Einstein" is a person's name, so it fits PERSON.Final Answer:
PERSON -> Option AQuick Check:
PERSON = Albert Einstein [OK]
- Confusing ORG with PERSON
- Labeling locations as PERSON
- Using DATE for names
"Google" in a named entity recognition task?Solution
Step 1: Identify what Google represents
Google is a company, which is an organization.Step 2: Match to entity type
ORG is the label for organizations like companies.Final Answer:
ORG -> Option BQuick Check:
ORG = Google [OK]
- Labeling companies as LOC
- Using PERSON for organizations
- Confusing DATE with ORG
"Barack Obama visited Paris on July 14, 2015." Which of the following is the correct sequence of entity types for [Barack Obama, Paris, July 14, 2015]?Solution
Step 1: Identify each entity type
"Barack Obama" is a person, "Paris" is a location, and "July 14, 2015" is a date.Step 2: Match entities to types in order
The sequence is PERSON, LOC, DATE.Final Answer:
[PERSON, LOC, DATE] -> Option CQuick Check:
PERSON, LOC, DATE = Barack Obama, Paris, July 14, 2015 [OK]
- Confusing ORG with LOC
- Mixing DATE with ORG
- Wrong order of entity types
"Amazon" as a LOC (location). What is the most likely error in this labeling?Solution
Step 1: Understand the entity "Amazon"
Amazon is commonly known as a company (organization), not a location.Step 2: Correct entity type for Amazon
ORG is the correct label for companies like Amazon.Final Answer:
Amazon is an organization, so it should be ORG -> Option AQuick Check:
ORG = Amazon company [OK]
- Assuming Amazon is only a location
- Labeling company names as PERSON
- Ignoring context of entity
"The conference was held in New York on March 3rd, 2023, and attended by experts from Google." Which entity types should your model identify to get the correct information?Solution
Step 1: Identify entities to extract
The task is to extract dates and locations only.Step 2: Match entity types for locations and dates
Locations are labeled LOC and dates are labeled DATE.Final Answer:
LOC and DATE -> Option DQuick Check:
LOC and DATE = New York, March 3rd, 2023 [OK]
- Extracting PERSON or ORG instead
- Mixing LOC with ORG
- Ignoring DATE entities
