Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Data extraction from text in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Data Extraction Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main purpose of Named Entity Recognition (NER) in data extraction?

Named Entity Recognition (NER) is a common technique in data extraction from text. What does NER primarily do?

AIt identifies and classifies key information like names, dates, and locations in text.
BIt translates text from one language to another.
CIt summarizes long documents into short paragraphs.
DIt generates new text based on input prompts.
Attempts:
2 left
💡 Hint

Think about what kind of information you want to pull out from a text document.

Predict Output
intermediate
1:30remaining
Output of simple regex extraction code

What is the output of the following Python code that extracts all email addresses from a text?

Prompt Engineering / GenAI
import re
text = 'Contact us at support@example.com or sales@example.org.'
emails = re.findall(r'\b[\w.-]+@[\w.-]+\.\w+\b', text)
print(emails)
A['support@example.com', 'sales@example.org']
B['support@example.com sales@example.org']
C['support@example', 'sales@example']
D[]
Attempts:
2 left
💡 Hint

Look at the regex pattern and what re.findall returns.

Model Choice
advanced
2:00remaining
Best model type for extracting structured data from unstructured text

You want to extract structured information like product names, prices, and dates from customer reviews. Which model type is best suited for this task?

AK-Means clustering for grouping similar texts
BGenerative Adversarial Network (GAN) for data generation
CConvolutional Neural Network (CNN) for image classification
DRecurrent Neural Network (RNN) or Transformer-based model for sequence labeling
Attempts:
2 left
💡 Hint

Think about models that handle sequences and labeling tasks.

Metrics
advanced
2:00remaining
Evaluating extraction accuracy with precision and recall

You built a model to extract dates from text. On a test set, it found 80 dates, of which 60 were correct. The test set actually contains 100 dates. What are the precision and recall?

APrecision = 0.80, Recall = 0.60
BPrecision = 0.60, Recall = 0.75
CPrecision = 0.75, Recall = 0.60
DPrecision = 0.60, Recall = 0.80
Attempts:
2 left
💡 Hint

Precision = correct found / total found; Recall = correct found / total actual.

🔧 Debug
expert
2:30remaining
Why does this extraction code raise an error?

Consider this Python code snippet for extracting phone numbers. Why does it raise an error?

Prompt Engineering / GenAI
import re
text = 'Call me at 123-456-7890 or 987-654-3210.'
pattern = r'\d{3}-\d{3}-\d{4}'
matches = re.match(pattern, text)
print(matches.group())
AThe regex pattern is invalid and causes a SyntaxError.
Bre.match only checks the start of the string, so it returns None causing an AttributeError.
CThe text variable is empty, so no matches are found.
Dmatches.group() is called before re.match is imported.
Attempts:
2 left
💡 Hint

Check how re.match works compared to re.findall or re.search.

Practice

(1/5)
1. What is the main goal of data extraction from text in AI?
easy
A. To find and pull out useful information like names and dates from text
B. To translate text from one language to another
C. To generate new text based on a prompt
D. To compress text files to save space

Solution

  1. Step 1: Understand the purpose of data extraction

    Data extraction means finding specific useful info inside text, such as names, dates, or places.
  2. Step 2: Compare options to the definition

    Only To find and pull out useful information like names and dates from text matches this purpose exactly, while others describe different tasks like translation or compression.
  3. Final Answer:

    To find and pull out useful information like names and dates from text -> Option A
  4. Quick Check:

    Data extraction = find useful info [OK]
Hint: Look for the option about finding info inside text [OK]
Common Mistakes:
  • Confusing extraction with translation
  • Thinking extraction means generating new text
  • Mixing extraction with file compression
2. Which of the following is the correct way to call a function extract_entities with a text input doc in Python?
easy
A. extract_entities = doc()
B. extract_entities(doc)
C. extract_entities.doc()
D. extract_entities->doc()

Solution

  1. Step 1: Recall Python function call syntax

    In Python, to call a function with an argument, use function_name(argument).
  2. Step 2: Check each option

    extract_entities(doc) uses correct syntax: extract_entities(doc). Options A, C, and D are invalid Python syntax for calling a function.
  3. Final Answer:

    extract_entities(doc) -> Option B
  4. Quick Check:

    Function call = function_name(argument) [OK]
Hint: Remember Python calls use parentheses with arguments inside [OK]
Common Mistakes:
  • Using dot notation to call a function
  • Assigning function call to function name
  • Using arrow notation like other languages
3. Given this Python code using a simple extraction model:
text = "Alice met Bob on 2023-04-01 in Paris."
entities = extract_entities(text)
print(entities)

If extract_entities returns a list of tuples with (entity, type), what is the expected output?
medium
A. {'Alice': 'PERSON', 'Bob': 'PERSON', '2023-04-01': 'DATE', 'Paris': 'LOCATION'}
B. ['Alice', 'Bob', '2023-04-01', 'Paris']
C. None
D. [('Alice', 'PERSON'), ('Bob', 'PERSON'), ('2023-04-01', 'DATE'), ('Paris', 'LOCATION')]

Solution

  1. Step 1: Understand the function output format

    The function returns a list of tuples, each tuple has (entity, type).
  2. Step 2: Match output to expected format

    [('Alice', 'PERSON'), ('Bob', 'PERSON'), ('2023-04-01', 'DATE'), ('Paris', 'LOCATION')] matches a list of tuples with entity and type pairs. ['Alice', 'Bob', '2023-04-01', 'Paris'] is just a list of strings, A is a dictionary, and D is None.
  3. Final Answer:

    [('Alice', 'PERSON'), ('Bob', 'PERSON'), ('2023-04-01', 'DATE'), ('Paris', 'LOCATION')] -> Option D
  4. Quick Check:

    List of (entity, type) tuples = [('Alice', 'PERSON'), ('Bob', 'PERSON'), ('2023-04-01', 'DATE'), ('Paris', 'LOCATION')] [OK]
Hint: Look for list of tuples format with entity and type [OK]
Common Mistakes:
  • Confusing list of strings with list of tuples
  • Expecting dictionary instead of list
  • Assuming function returns None
4. You have this code snippet:
def extract_entities(text):
    entities = []
    for word in text.split():
        if word.istitle():
            entities.append((word, 'PERSON'))
    return entities

text = "John and Mary went to London."
print(extract_entities(text))

What is the bug in this code for extracting entities?
medium
A. It only detects words starting with uppercase, missing multi-word names
B. It does not split text into words
C. It returns a string instead of a list
D. It crashes because of missing import

Solution

  1. Step 1: Analyze the extraction logic

    The code checks if each word starts with uppercase (istitle) and labels it as 'PERSON'.
  2. Step 2: Identify limitation

    This misses multi-word names like 'New York' or full names with multiple words. It only detects single capitalized words.
  3. Final Answer:

    It only detects words starting with uppercase, missing multi-word names -> Option A
  4. Quick Check:

    Single-word detection limitation = It only detects words starting with uppercase, missing multi-word names [OK]
Hint: Check if code handles multi-word names or just single words [OK]
Common Mistakes:
  • Thinking split() is missing
  • Assuming return type is wrong
  • Expecting import needed for this code
5. You want to extract dates and locations from a large text using a pretrained AI model. Which approach best improves accuracy and speed?
hard
A. Use a generic language model without any fine-tuning
B. Manually write rules to find dates and locations using string matching
C. Use a named entity recognition (NER) model fine-tuned on your domain data
D. Extract all capitalized words as locations and all numbers as dates

Solution

  1. Step 1: Consider model choice for extraction

    Fine-tuning a NER model on your specific domain helps it learn patterns and improves accuracy.
  2. Step 2: Compare other options

    Manual rules are slow and brittle, generic models lack domain knowledge, and simple heuristics miss many cases.
  3. Final Answer:

    Use a named entity recognition (NER) model fine-tuned on your domain data -> Option C
  4. Quick Check:

    Fine-tuned NER model = best accuracy and speed [OK]
Hint: Fine-tune NER models for best extraction results [OK]
Common Mistakes:
  • Relying on manual rules only
  • Using generic models without tuning
  • Using simple heuristics that miss cases