Named entity recognition (NER) is a common pattern in information extraction. What does NER primarily do?
Think about extracting specific types of information such as people or places.
NER finds and labels entities like person names, organizations, locations, and dates in text, which is key for extracting structured data.
Given the code below that extracts dates from text using a regex pattern, what is the output?
import re text = 'The event is on 2024-07-15 and registration ends 2024-06-30.' dates = re.findall(r'\d{4}-\d{2}-\d{2}', text) print(dates)
Look at the regex pattern and what it matches.
The regex matches full dates in YYYY-MM-DD format, so it finds both dates in the text.
You want to extract not just entities but also the relationships between them (like 'works_for' or 'located_in'). Which model type is best for this task?
Think about models that classify pairs of entities for relationships.
Relation extraction models based on transformers classify relationships between entity pairs, making them ideal for this task.
You have a model that extracts entities from text. Which metric best measures how well it finds the correct entities?
Consider metrics that balance precision and recall.
F1 score balances precision (correctness) and recall (completeness), making it ideal for entity extraction evaluation.
Consider the code below that tries to extract person names using a simple pattern. Why does it fail to find any matches?
import re text = 'Alice and Bob went to the market.' pattern = r'[A-Z][a-z]+' matches = re.findall(pattern, text) print(matches)
Look carefully at what the regex pattern matches.
The pattern '[A-Z][a-z]+' matches one uppercase letter followed by one or more lowercase letters (like 'Alice' or 'Bob'), which is correct. The original pattern lacked the '+' quantifier, causing it to match only two-letter sequences.