0
0
Data Analysis Pythondata~20 mins

Extracting with str.extract (regex) in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Regex Extraction Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this code using str.extract?
Given the pandas Series with mixed strings, what will be the output of extracting the first number using regex?
Data Analysis Python
import pandas as pd
s = pd.Series(['apple 123', 'banana 456', 'cherry 789'])
result = s.str.extract(r'(\d+)')
print(result)
A
       0
0    123
1    456
2    789
B
       0
0  apple
1  banana
2  cherry
C
Empty DataFrame
Columns: []
Index: [0, 1, 2]
D
0    123
1    456
2    789
dtype: object
Attempts:
2 left
💡 Hint
Remember str.extract returns a DataFrame with matched groups as columns.
data_output
intermediate
1:30remaining
How many rows have a match in this extraction?
Using str.extract with this regex, how many rows will have a non-null value?
Data Analysis Python
import pandas as pd
s = pd.Series(['cat123', 'dog', 'bird456', 'fish'])
result = s.str.extract(r'(\d+)')
count = result[0].notnull().sum()
print(count)
A1
B3
C4
D2
Attempts:
2 left
💡 Hint
Check which strings contain digits.
🔧 Debug
advanced
2:00remaining
What error does this code raise?
What error will this code raise when trying to extract with an invalid regex pattern?
Data Analysis Python
import pandas as pd
s = pd.Series(['abc123', 'def456'])
result = s.str.extract(r'(\d++')
print(result)
AAttributeError: 'Series' object has no attribute 'str'
Bre.error: missing ), unterminated subpattern at position 4
CTypeError: expected string or bytes-like object
DKeyError: 0
Attempts:
2 left
💡 Hint
Look carefully at the regex pattern syntax.
🚀 Application
advanced
2:30remaining
Extract date components from strings
Given a Series of date strings like '2023-06-15', which option correctly extracts year, month, and day into separate columns?
Data Analysis Python
import pandas as pd
s = pd.Series(['2023-06-15', '2024-01-30', '2022-12-05'])
result = s.str.extract(r'(\d{4})-(\d{2})-(\d{2})')
print(result)
A
       0
0  2023-06-15
1  2024-01-30
2  2022-12-05
B
0    2023-06-15
1    2024-01-30
2    2022-12-05
dtype: object
C
       0   1   2
0  2023  06  15
1  2024  01  30
2  2022  12  05
D
Empty DataFrame
Columns: []
Index: [0, 1, 2]
Attempts:
2 left
💡 Hint
Each pair of parentheses captures one group as a column.
🧠 Conceptual
expert
1:30remaining
Why does str.extract return a DataFrame instead of a Series?
Choose the best explanation for why pandas str.extract returns a DataFrame even when extracting a single group.
ABecause regex groups can be multiple, so str.extract always returns a DataFrame with one column per group for consistency.
BBecause str.extract only works on DataFrames, not Series.
CBecause pandas Series cannot hold string data, so DataFrame is used instead.
DBecause the output must always be numeric, and DataFrame enforces this.
Attempts:
2 left
💡 Hint
Think about how many groups regex can capture.