What is Regex operations in Pandas?

Pandasdata~5 mins

Regex operations in Pandas

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Regex helps find patterns in text data. In Pandas, it makes searching and changing text easy and fast.

You want to find rows where a column has emails.

You need to replace phone numbers in a text column with a standard format.

You want to filter data where names start with a certain letter.

You want to split a column by a pattern like commas or spaces.

You want to check if a text column contains a specific pattern.

Syntax

Pandas

df['column'].str.contains('pattern', regex=True)
df['column'].str.replace('pattern', 'new_text', regex=True)
df['column'].str.extract('pattern')
df['column'].str.match('pattern')

Use str accessor to apply regex on text columns.

Set regex=True to treat the pattern as a regex (default in recent pandas versions).

Examples

Find rows where 'email' column contains '@gmail.com'.

Pandas

df['email'].str.contains(r'@gmail\.com')

Remove all non-digit characters from 'phone' column.

Pandas

df['phone'].str.replace(r'\D', '', regex=True)

Extract the first word from the 'name' column.

Pandas

df['name'].str.extract(r'^(\w+)')

Check if 'code' column matches pattern: starts with 'A' followed by exactly 3 digits.

Pandas

df['code'].str.match(r'^A\d{3}$')

Sample Program

This program shows how to use regex in Pandas to filter, clean, extract, and match text data.

Pandas

import pandas as pd

data = {
    'email': ['alice@gmail.com', 'bob@yahoo.com', 'carol@gmail.com', 'dave@hotmail.com'],
    'phone': ['(123) 456-7890', '987-654-3210', '555 666 7777', '444.333.2222'],
    'name': ['Alice Smith', 'Bob Jones', 'Carol White', 'Dave Black'],
    'code': ['A123', 'B234', 'A999', 'C456']
}

df = pd.DataFrame(data)

# Find emails with gmail
gmail_filter = df['email'].str.contains(r'@gmail\.com')

# Clean phone numbers to digits only
clean_phones = df['phone'].str.replace(r'\D', '', regex=True)

# Extract first name
first_names = df['name'].str.extract(r'^(\w+)')

# Check codes starting with A and 3 digits
code_match = df['code'].str.match(r'^A\d{3}$')

print('Rows with Gmail emails:')
print(df[gmail_filter])

print('\nClean phone numbers:')
print(clean_phones)

print('\nFirst names extracted:')
print(first_names)

print('\nCodes matching pattern:')
print(code_match)

OutputSuccess

Important Notes

Regex patterns use special symbols like \d for digits and \w for letters.

Remember to double backslashes \\ in Python strings for regex.

Use raw strings r'' to write regex patterns easily.

Summary

Regex in Pandas helps find and change text patterns in columns.

Use str.contains, str.replace, str.extract, and str.match for common tasks.

Always test your regex on sample data to avoid mistakes.