We use str.extract to pull out specific parts of text from data using patterns. It helps us find and save useful pieces from messy text.
0
0
Extracting with str.extract (regex) in Data Analysis Python
Introduction
You want to get phone numbers from a list of customer messages.
You need to find dates hidden inside product reviews.
You want to separate area codes from full phone numbers in a contact list.
You need to extract email usernames from email addresses.
You want to pull out hashtags from social media posts.
Syntax
Data Analysis Python
Series.str.extract(pat, flags=0, expand=True)
pat is the pattern you want to find, written as a regular expression (regex).
If expand=True, the result is a DataFrame; if False, it returns a Series.
Examples
Extracts the first 3 digits (like area code) from phone numbers.
Data Analysis Python
df['phone'].str.extract(r'(\d{3})')
Extracts dates in YYYY-MM-DD format from text.
Data Analysis Python
df['text'].str.extract(r'(\d{4}-\d{2}-\d{2})')
Extracts the username part before '@' in email addresses.
Data Analysis Python
df['email'].str.extract(r'([^@]+)@')
Sample Program
This code creates a small table with messages. It then extracts area codes, dates, and email usernames using str.extract with regex patterns.
Data Analysis Python
import pandas as pd data = {'messages': ['Call me at 415-555-1234', 'My birthday is 1990-05-21', 'Email: user@example.com']} df = pd.DataFrame(data) # Extract area code (3 digits) from phone numbers area_codes = df['messages'].str.extract(r'(\d{3})') # Extract date in YYYY-MM-DD format dates = df['messages'].str.extract(r'(\d{4}-\d{2}-\d{2})') # Extract username from email usernames = df['messages'].str.extract(r'([\w\.]+)@') print('Area Codes:') print(area_codes) print('\nDates:') print(dates) print('\nUsernames:') print(usernames)
OutputSuccess
Important Notes
Regex patterns use special symbols to match text. For example, \d means any digit.
If no match is found, str.extract returns NaN for that row.
Use parentheses () in regex to capture the part you want to extract.
Summary
str.extract helps pull out parts of text using patterns.
It returns a new table with the extracted pieces or NaN if nothing matches.
Useful for cleaning and organizing messy text data.