0
0
Data Analysis Pythondata~5 mins

Pattern matching with str.contains in Data Analysis Python

Choose your learning style9 modes available
Introduction

We use pattern matching with str.contains to find rows in data that have certain words or patterns. It helps us quickly filter data based on text.

You want to find all customer reviews that mention the word 'good'.
You need to filter a list of emails to find those containing 'gmail.com'.
You want to select rows where a product description includes 'organic'.
You want to check if any names in a list contain the letter 'a'.
Syntax
Data Analysis Python
import pandas as pd

data = pd.Series(['apple', 'banana', 'cherry', 'date'])

mask = data.str.contains('an')
filtered_data = data[mask]

The str.contains method returns a boolean series showing True where the pattern is found.

You can use regular expressions for complex patterns or simple strings for exact matches.

Examples
This finds all fruits with 'an' in their name.
Data Analysis Python
import pandas as pd

data = pd.Series(['apple', 'banana', 'cherry', 'date'])

# Find items containing 'an'
mask = data.str.contains('an')
print(data[mask])
When the data is empty, the result is also empty without error.
Data Analysis Python
import pandas as pd

data = pd.Series(['apple', 'banana', 'cherry', 'date'])

# What if the series is empty?
empty_data = pd.Series([], dtype=object)
mask_empty = empty_data.str.contains('a')
print(empty_data[mask_empty])
Works fine even with one element.
Data Analysis Python
import pandas as pd

data = pd.Series(['apple'])

# Only one element, check if it contains 'a'
mask_single = data.str.contains('a')
print(data[mask_single])
Using regex to find elements starting with 'a'.
Data Analysis Python
import pandas as pd

data = pd.Series(['apple', 'banana', 'cherry', 'date'])

# Check if elements start with 'a' using regex
mask_start_a = data.str.contains('^a')
print(data[mask_start_a])
Sample Program

This program shows the original list of fruits and then filters to show only those containing the letter 'a'.

Data Analysis Python
import pandas as pd

# Create a Series of fruit names
fruit_names = pd.Series(['apple', 'banana', 'cherry', 'date', 'avocado'])

print('Original data:')
print(fruit_names)

# Find fruits containing the letter 'a'
contains_a_mask = fruit_names.str.contains('a')
filtered_fruits = fruit_names[contains_a_mask]

print('\nFruits containing the letter "a":')
print(filtered_fruits)
OutputSuccess
Important Notes

Time complexity: O(n) where n is the number of elements, because it checks each string once.

Space complexity: O(n) for the boolean mask created.

Common mistake: Forgetting that str.contains returns a boolean series, so you must use it to filter the original data.

Use str.contains when you want to filter data by text patterns. For exact matches, consider using == or str.match for full string matches.

Summary

Pattern matching with str.contains helps find text inside data easily.

It returns True or False for each item, so you can filter your data.

You can use simple words or complex patterns with regular expressions.