Named Entity Recognition (NER) is a common technique in data extraction from text. What does NER primarily do?
Think about what kind of information you want to pull out from a text document.
NER finds specific pieces of information such as names, dates, and places, which helps in extracting structured data from unstructured text.
What is the output of the following Python code that extracts all email addresses from a text?
import re text = 'Contact us at support@example.com or sales@example.org.' emails = re.findall(r'\b[\w.-]+@[\w.-]+\.\w+\b', text) print(emails)
Look at the regex pattern and what re.findall returns.
The regex matches email patterns and re.findall returns a list of all matches found in the text.
You want to extract structured information like product names, prices, and dates from customer reviews. Which model type is best suited for this task?
Think about models that handle sequences and labeling tasks.
RNNs and Transformer models are designed to process sequences and can label each token, making them suitable for extracting structured data from text.
You built a model to extract dates from text. On a test set, it found 80 dates, of which 60 were correct. The test set actually contains 100 dates. What are the precision and recall?
Precision = correct found / total found; Recall = correct found / total actual.
Precision = 60/80 = 0.75; Recall = 60/100 = 0.60.
Consider this Python code snippet for extracting phone numbers. Why does it raise an error?
import re text = 'Call me at 123-456-7890 or 987-654-3210.' pattern = r'\d{3}-\d{3}-\d{4}' matches = re.match(pattern, text) print(matches.group())
Check how re.match works compared to re.findall or re.search.
re.match tries to match the pattern only at the start of the string. Since the phone number is not at the start, it returns None, so calling group() causes an error.