0
0
Prompt Engineering / GenAIml~20 mins

Data extraction from text in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Data Extraction Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main purpose of Named Entity Recognition (NER) in data extraction?

Named Entity Recognition (NER) is a common technique in data extraction from text. What does NER primarily do?

AIt identifies and classifies key information like names, dates, and locations in text.
BIt translates text from one language to another.
CIt summarizes long documents into short paragraphs.
DIt generates new text based on input prompts.
Attempts:
2 left
💡 Hint

Think about what kind of information you want to pull out from a text document.

Predict Output
intermediate
1:30remaining
Output of simple regex extraction code

What is the output of the following Python code that extracts all email addresses from a text?

Prompt Engineering / GenAI
import re
text = 'Contact us at support@example.com or sales@example.org.'
emails = re.findall(r'\b[\w.-]+@[\w.-]+\.\w+\b', text)
print(emails)
A['support@example.com', 'sales@example.org']
B['support@example.com sales@example.org']
C['support@example', 'sales@example']
D[]
Attempts:
2 left
💡 Hint

Look at the regex pattern and what re.findall returns.

Model Choice
advanced
2:00remaining
Best model type for extracting structured data from unstructured text

You want to extract structured information like product names, prices, and dates from customer reviews. Which model type is best suited for this task?

AK-Means clustering for grouping similar texts
BGenerative Adversarial Network (GAN) for data generation
CConvolutional Neural Network (CNN) for image classification
DRecurrent Neural Network (RNN) or Transformer-based model for sequence labeling
Attempts:
2 left
💡 Hint

Think about models that handle sequences and labeling tasks.

Metrics
advanced
2:00remaining
Evaluating extraction accuracy with precision and recall

You built a model to extract dates from text. On a test set, it found 80 dates, of which 60 were correct. The test set actually contains 100 dates. What are the precision and recall?

APrecision = 0.80, Recall = 0.60
BPrecision = 0.60, Recall = 0.75
CPrecision = 0.75, Recall = 0.60
DPrecision = 0.60, Recall = 0.80
Attempts:
2 left
💡 Hint

Precision = correct found / total found; Recall = correct found / total actual.

🔧 Debug
expert
2:30remaining
Why does this extraction code raise an error?

Consider this Python code snippet for extracting phone numbers. Why does it raise an error?

Prompt Engineering / GenAI
import re
text = 'Call me at 123-456-7890 or 987-654-3210.'
pattern = r'\d{3}-\d{3}-\d{4}'
matches = re.match(pattern, text)
print(matches.group())
AThe regex pattern is invalid and causes a SyntaxError.
Bre.match only checks the start of the string, so it returns None causing an AttributeError.
CThe text variable is empty, so no matches are found.
Dmatches.group() is called before re.match is imported.
Attempts:
2 left
💡 Hint

Check how re.match works compared to re.findall or re.search.