0
0
Data Analysis Pythondata~5 mins

Extracting with str.extract (regex) in Data Analysis Python - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What does the str.extract() method do in pandas?
It extracts groups from strings in a pandas Series or DataFrame column using a regular expression (regex). It returns the matched groups as new columns.
Click to reveal answer
beginner
How do you specify which part of the string to extract using str.extract()?
You use parentheses () in the regex pattern to define capture groups. Only the text matched inside these groups is extracted.
Click to reveal answer
intermediate
What type of object does str.extract() return when extracting one group vs multiple groups?
If one group is extracted, it returns a Series. If multiple groups are extracted, it returns a DataFrame with one column per group.
Click to reveal answer
beginner
Why is it useful to use str.extract() with regex in data cleaning?
It helps pull out specific parts of messy text data, like extracting dates, codes, or names, making the data easier to analyze.
Click to reveal answer
intermediate
Example: What does df['col'].str.extract(r'(\d{3})-(\d{2})') extract?
It extracts two groups of digits separated by a dash: the first group with exactly 3 digits, and the second group with exactly 2 digits, returning them as two columns.
Click to reveal answer
What symbol in regex defines a capture group for str.extract()?
A()
B[]
C{}
D<>
If your regex has two capture groups, what does str.extract() return?
AA Series with one column
BA DataFrame with two columns
CA list of strings
DA single string
Which pandas object can you use str.extract() on?
ADataFrame
BSeries
CBoth DataFrame and Series
Dnull
What happens if the regex pattern does not match any part of the string?
AReturns the original string
BReturns NaN for that row
CRaises an error
DReturns an empty string
Which of these regex patterns extracts a 4-digit year from a string?
Ar'(\d{4})'
Br'\d{2}'
Cr'(\w{4})'
Dr'\d{5}'
Explain how you would use str.extract() to pull out an area code from phone numbers in a pandas Series.
Think about the pattern for area codes like three digits inside parentheses or at the start.
You got /3 concepts.
    Describe the difference in output when extracting one group versus multiple groups with str.extract().
    Consider how pandas organizes extracted data based on number of groups.
    You got /3 concepts.