What is str.contains() for pattern matching in Pandas?

Pandasdata~5 mins

str.contains() for pattern matching in Pandas

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

We use str.contains() to find if text data has a certain pattern or word. It helps us quickly check or filter data based on text.

You want to find all rows in a table where a column has a specific word.

You need to filter emails that contain '@gmail.com'.

You want to check if product descriptions mention 'organic'.

You want to find phone numbers that start with a certain area code.

You want to detect if a comment contains a certain keyword.

Syntax

Pandas

DataFrame['column_name'].str.contains(pattern, case=True, na=False, regex=True)

pattern is the text or regular expression you want to find.

case=True means matching is case sensitive. Use case=False to ignore case.

Examples

Find rows where 'Name' has 'John' exactly (case sensitive).

Pandas

df['Name'].str.contains('John')

Find emails containing '@gmail.com' ignoring case differences.

Pandas

df['Email'].str.contains('@gmail.com', case=False)

Find descriptions containing either 'organic' or 'natural' using regex.

Pandas

df['Description'].str.contains('organic|natural', regex=True)

Find phone numbers starting with any 3 digits (area code) using regex.

Pandas

df['Phone'].str.contains('^\d{3}', regex=True)

Sample Program

This program creates a table of products with descriptions. It then finds all products whose description mentions 'organic' or 'natural', ignoring case. It prints the original and filtered tables.

Pandas

import pandas as pd

# Create a sample DataFrame
products = pd.DataFrame({
    'Product': ['Apple Juice', 'Orange Juice', 'Organic Milk', 'Natural Honey', 'Regular Milk'],
    'Description': ['Fresh apple juice', 'Sweet orange juice', '100% organic milk', 'Pure natural honey', None]
})

print('Original DataFrame:')
print(products)

# Find rows where Description contains 'organic' or 'natural' ignoring case
pattern = 'organic|natural'
filtered = products[products['Description'].str.contains(pattern, case=False, na=False, regex=True)]

print('\nFiltered DataFrame (contains "organic" or "natural"):')
print(filtered)

OutputSuccess

Important Notes

Time complexity: O(n) where n is number of rows, because it checks each row's text.

Space complexity: O(n) for the boolean mask created during filtering.

Common mistake: Forgetting na=False causes errors if there are missing values.

Use str.contains() when you want to check if text includes a pattern. Use other methods like str.startswith() if you only want to check the start.

Summary

str.contains() helps find text patterns in columns.

It supports case sensitivity and regular expressions.

Always handle missing data with na=False to avoid errors.