0
0
PandasHow-ToBeginner · 3 min read

How to Use str.contains in pandas for String Filtering

Use str.contains() on a pandas Series to check if each string contains a specific pattern. It returns a boolean Series that you can use to filter rows in a DataFrame.
📐

Syntax

The basic syntax of str.contains() is:

  • Series.str.contains(pat, case=True, na=np.nan, regex=True)

Where:

  • pat: The string or regex pattern to search for.
  • case: Whether to match case sensitively (default is True).
  • na: Fill value for missing values (default is NaN).
  • regex: Whether pat is a regex pattern (default is True).
python
Series.str.contains(pat, case=True, na=np.nan, regex=True)
💻

Example

This example shows how to filter rows in a DataFrame where the 'Name' column contains the substring 'an' (case insensitive).

python
import pandas as pd

data = {'Name': ['Anna', 'Bob', 'Charlie', 'David', 'Eleanor'],
        'Age': [23, 35, 45, 28, 32]}
df = pd.DataFrame(data)

# Filter rows where 'Name' contains 'an' ignoring case
mask = df['Name'].str.contains('an', case=False, na=False)
filtered_df = df[mask]
print(filtered_df)
Output
Name Age 0 Anna 23
⚠️

Common Pitfalls

Common mistakes when using str.contains() include:

  • Not handling missing values, which causes errors. Use na=False to avoid this.
  • Forgetting that pat is treated as a regex by default, which can cause unexpected matches or errors if the pattern contains special characters.
  • Case sensitivity can cause missed matches if not set properly.
python
import pandas as pd

data = {'Name': ['Anna', None, 'Charlie']}
df = pd.DataFrame(data)

# Wrong: This raises error due to None value
# mask = df['Name'].str.contains('a')

# Right: Handle missing values with na=False
mask = df['Name'].str.contains('a', na=False)
print(df[mask])
Output
Name 0 Anna 2 Charlie
📊

Quick Reference

ParameterDescriptionDefault
patString or regex pattern to search forRequired
caseMatch case sensitivelyTrue
naFill value for missing valuesNaN
regexInterpret pat as regexTrue

Key Takeaways

Use str.contains() on a pandas Series to get a boolean mask for filtering.
Set na=False to avoid errors with missing values.
Remember str.contains() treats the pattern as regex by default.
Use case=False to ignore letter case when matching.
Combine str.contains() with DataFrame filtering to select rows.