How to Use str.contains in pandas for String Filtering
Use
str.contains() on a pandas Series to check if each string contains a specific pattern. It returns a boolean Series that you can use to filter rows in a DataFrame.Syntax
The basic syntax of str.contains() is:
Series.str.contains(pat, case=True, na=np.nan, regex=True)
Where:
pat: The string or regex pattern to search for.case: Whether to match case sensitively (default isTrue).na: Fill value for missing values (default isNaN).regex: Whetherpatis a regex pattern (default isTrue).
python
Series.str.contains(pat, case=True, na=np.nan, regex=True)
Example
This example shows how to filter rows in a DataFrame where the 'Name' column contains the substring 'an' (case insensitive).
python
import pandas as pd data = {'Name': ['Anna', 'Bob', 'Charlie', 'David', 'Eleanor'], 'Age': [23, 35, 45, 28, 32]} df = pd.DataFrame(data) # Filter rows where 'Name' contains 'an' ignoring case mask = df['Name'].str.contains('an', case=False, na=False) filtered_df = df[mask] print(filtered_df)
Output
Name Age
0 Anna 23
Common Pitfalls
Common mistakes when using str.contains() include:
- Not handling missing values, which causes errors. Use
na=Falseto avoid this. - Forgetting that
patis treated as a regex by default, which can cause unexpected matches or errors if the pattern contains special characters. - Case sensitivity can cause missed matches if not set properly.
python
import pandas as pd data = {'Name': ['Anna', None, 'Charlie']} df = pd.DataFrame(data) # Wrong: This raises error due to None value # mask = df['Name'].str.contains('a') # Right: Handle missing values with na=False mask = df['Name'].str.contains('a', na=False) print(df[mask])
Output
Name
0 Anna
2 Charlie
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| pat | String or regex pattern to search for | Required |
| case | Match case sensitively | True |
| na | Fill value for missing values | NaN |
| regex | Interpret pat as regex | True |
Key Takeaways
Use str.contains() on a pandas Series to get a boolean mask for filtering.
Set na=False to avoid errors with missing values.
Remember str.contains() treats the pattern as regex by default.
Use case=False to ignore letter case when matching.
Combine str.contains() with DataFrame filtering to select rows.