Recall & Review
beginner
What does the
str.extract() method do in pandas?It extracts groups from strings in a pandas Series or DataFrame column using a regular expression (regex). It returns the matched groups as new columns.
Click to reveal answer
beginner
How do you specify which part of the string to extract using
str.extract()?You use parentheses
() in the regex pattern to define capture groups. Only the text matched inside these groups is extracted.Click to reveal answer
intermediate
What type of object does
str.extract() return when extracting one group vs multiple groups?If one group is extracted, it returns a Series. If multiple groups are extracted, it returns a DataFrame with one column per group.
Click to reveal answer
beginner
Why is it useful to use
str.extract() with regex in data cleaning?It helps pull out specific parts of messy text data, like extracting dates, codes, or names, making the data easier to analyze.
Click to reveal answer
intermediate
Example: What does
df['col'].str.extract(r'(\d{3})-(\d{2})') extract?It extracts two groups of digits separated by a dash: the first group with exactly 3 digits, and the second group with exactly 2 digits, returning them as two columns.
Click to reveal answer
What symbol in regex defines a capture group for
str.extract()?✗ Incorrect
Parentheses
() define capture groups in regex, which str.extract() uses to extract parts of strings.If your regex has two capture groups, what does
str.extract() return?✗ Incorrect
Two capture groups produce a DataFrame with one column per group.
Which pandas object can you use
str.extract() on?✗ Incorrect
str.extract() is used on pandas Series (single columns). For DataFrames, you apply it on a column (Series).What happens if the regex pattern does not match any part of the string?
✗ Incorrect
If no match is found,
str.extract() returns NaN for that row.Which of these regex patterns extracts a 4-digit year from a string?
✗ Incorrect
The pattern
r'(\d{4})' captures exactly 4 digits, suitable for a year.Explain how you would use
str.extract() to pull out an area code from phone numbers in a pandas Series.Think about the pattern for area codes like three digits inside parentheses or at the start.
You got /3 concepts.
Describe the difference in output when extracting one group versus multiple groups with
str.extract().Consider how pandas organizes extracted data based on number of groups.
You got /3 concepts.