0
0
Pandasdata~5 mins

Regex operations in Pandas

Choose your learning style9 modes available
Introduction

Regex helps find patterns in text data. In Pandas, it makes searching and changing text easy and fast.

You want to find rows where a column has emails.
You need to replace phone numbers in a text column with a standard format.
You want to filter data where names start with a certain letter.
You want to split a column by a pattern like commas or spaces.
You want to check if a text column contains a specific pattern.
Syntax
Pandas
df['column'].str.contains('pattern', regex=True)
df['column'].str.replace('pattern', 'new_text', regex=True)
df['column'].str.extract('pattern')
df['column'].str.match('pattern')

Use str accessor to apply regex on text columns.

Set regex=True to treat the pattern as a regex (default in recent pandas versions).

Examples
Find rows where 'email' column contains '@gmail.com'.
Pandas
df['email'].str.contains(r'@gmail\.com')
Remove all non-digit characters from 'phone' column.
Pandas
df['phone'].str.replace(r'\D', '', regex=True)
Extract the first word from the 'name' column.
Pandas
df['name'].str.extract(r'^(\w+)')
Check if 'code' column matches pattern: starts with 'A' followed by exactly 3 digits.
Pandas
df['code'].str.match(r'^A\d{3}$')
Sample Program

This program shows how to use regex in Pandas to filter, clean, extract, and match text data.

Pandas
import pandas as pd

data = {
    'email': ['alice@gmail.com', 'bob@yahoo.com', 'carol@gmail.com', 'dave@hotmail.com'],
    'phone': ['(123) 456-7890', '987-654-3210', '555 666 7777', '444.333.2222'],
    'name': ['Alice Smith', 'Bob Jones', 'Carol White', 'Dave Black'],
    'code': ['A123', 'B234', 'A999', 'C456']
}

df = pd.DataFrame(data)

# Find emails with gmail
gmail_filter = df['email'].str.contains(r'@gmail\.com')

# Clean phone numbers to digits only
clean_phones = df['phone'].str.replace(r'\D', '', regex=True)

# Extract first name
first_names = df['name'].str.extract(r'^(\w+)')

# Check codes starting with A and 3 digits
code_match = df['code'].str.match(r'^A\d{3}$')

print('Rows with Gmail emails:')
print(df[gmail_filter])

print('\nClean phone numbers:')
print(clean_phones)

print('\nFirst names extracted:')
print(first_names)

print('\nCodes matching pattern:')
print(code_match)
OutputSuccess
Important Notes

Regex patterns use special symbols like \d for digits and \w for letters.

Remember to double backslashes \\ in Python strings for regex.

Use raw strings r'' to write regex patterns easily.

Summary

Regex in Pandas helps find and change text patterns in columns.

Use str.contains, str.replace, str.extract, and str.match for common tasks.

Always test your regex on sample data to avoid mistakes.