Challenge - 5 Problems
Regex Extraction Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this code using str.extract?
Given the pandas Series with mixed strings, what will be the output of extracting the first number using regex?
Data Analysis Python
import pandas as pd s = pd.Series(['apple 123', 'banana 456', 'cherry 789']) result = s.str.extract(r'(\d+)') print(result)
Attempts:
2 left
💡 Hint
Remember str.extract returns a DataFrame with matched groups as columns.
✗ Incorrect
The regex '(\d+)' captures one or more digits. str.extract returns a DataFrame with one column named 0 containing the matched digits as strings.
❓ data_output
intermediate1:30remaining
How many rows have a match in this extraction?
Using str.extract with this regex, how many rows will have a non-null value?
Data Analysis Python
import pandas as pd s = pd.Series(['cat123', 'dog', 'bird456', 'fish']) result = s.str.extract(r'(\d+)') count = result[0].notnull().sum() print(count)
Attempts:
2 left
💡 Hint
Check which strings contain digits.
✗ Incorrect
Only 'cat123' and 'bird456' contain digits, so 2 rows have matches.
🔧 Debug
advanced2:00remaining
What error does this code raise?
What error will this code raise when trying to extract with an invalid regex pattern?
Data Analysis Python
import pandas as pd s = pd.Series(['abc123', 'def456']) result = s.str.extract(r'(\d++') print(result)
Attempts:
2 left
💡 Hint
Look carefully at the regex pattern syntax.
✗ Incorrect
The regex '(\d++' is invalid because of the extra '+' and missing closing parenthesis, causing a regex compilation error.
🚀 Application
advanced2:30remaining
Extract date components from strings
Given a Series of date strings like '2023-06-15', which option correctly extracts year, month, and day into separate columns?
Data Analysis Python
import pandas as pd s = pd.Series(['2023-06-15', '2024-01-30', '2022-12-05']) result = s.str.extract(r'(\d{4})-(\d{2})-(\d{2})') print(result)
Attempts:
2 left
💡 Hint
Each pair of parentheses captures one group as a column.
✗ Incorrect
The regex has three groups capturing year, month, and day. str.extract returns a DataFrame with three columns named 0, 1, 2.
🧠 Conceptual
expert1:30remaining
Why does str.extract return a DataFrame instead of a Series?
Choose the best explanation for why pandas str.extract returns a DataFrame even when extracting a single group.
Attempts:
2 left
💡 Hint
Think about how many groups regex can capture.
✗ Incorrect
Regex can capture multiple groups, so pandas returns a DataFrame with one column per group to handle all cases uniformly.