0
0
Pandasdata~5 mins

str.contains() for pattern matching in Pandas

Choose your learning style9 modes available
Introduction

We use str.contains() to find if text data has a certain pattern or word. It helps us quickly check or filter data based on text.

You want to find all rows in a table where a column has a specific word.
You need to filter emails that contain '@gmail.com'.
You want to check if product descriptions mention 'organic'.
You want to find phone numbers that start with a certain area code.
You want to detect if a comment contains a certain keyword.
Syntax
Pandas
DataFrame['column_name'].str.contains(pattern, case=True, na=False, regex=True)

pattern is the text or regular expression you want to find.

case=True means matching is case sensitive. Use case=False to ignore case.

Examples
Find rows where 'Name' has 'John' exactly (case sensitive).
Pandas
df['Name'].str.contains('John')
Find emails containing '@gmail.com' ignoring case differences.
Pandas
df['Email'].str.contains('@gmail.com', case=False)
Find descriptions containing either 'organic' or 'natural' using regex.
Pandas
df['Description'].str.contains('organic|natural', regex=True)
Find phone numbers starting with any 3 digits (area code) using regex.
Pandas
df['Phone'].str.contains('^\d{3}', regex=True)
Sample Program

This program creates a table of products with descriptions. It then finds all products whose description mentions 'organic' or 'natural', ignoring case. It prints the original and filtered tables.

Pandas
import pandas as pd

# Create a sample DataFrame
products = pd.DataFrame({
    'Product': ['Apple Juice', 'Orange Juice', 'Organic Milk', 'Natural Honey', 'Regular Milk'],
    'Description': ['Fresh apple juice', 'Sweet orange juice', '100% organic milk', 'Pure natural honey', None]
})

print('Original DataFrame:')
print(products)

# Find rows where Description contains 'organic' or 'natural' ignoring case
pattern = 'organic|natural'
filtered = products[products['Description'].str.contains(pattern, case=False, na=False, regex=True)]

print('\nFiltered DataFrame (contains "organic" or "natural"):')
print(filtered)
OutputSuccess
Important Notes

Time complexity: O(n) where n is number of rows, because it checks each row's text.

Space complexity: O(n) for the boolean mask created during filtering.

Common mistake: Forgetting na=False causes errors if there are missing values.

Use str.contains() when you want to check if text includes a pattern. Use other methods like str.startswith() if you only want to check the start.

Summary

str.contains() helps find text patterns in columns.

It supports case sensitivity and regular expressions.

Always handle missing data with na=False to avoid errors.