Select Rows Where Column Contains in pandas: Simple Guide
Use the
DataFrame.loc with column.str.contains('substring') to select rows where a column contains a specific substring. This filters rows by checking if the substring exists anywhere in the column's text values.Syntax
Use df.loc[df['column'].str.contains('substring', na=False)] to select rows where the column contains the substring. Here:
dfis your DataFrame.columnis the column name to check.str.contains()checks if the substring is in each value.na=Falseavoids errors with missing values.
python
df.loc[df['column'].str.contains('substring', na=False)]
Example
This example shows how to select rows where the 'Name' column contains the substring 'an'.
python
import pandas as pd data = {'Name': ['Anna', 'Bob', 'Charlie', 'Diana', 'Evan'], 'Age': [23, 35, 45, 29, 40]} df = pd.DataFrame(data) result = df.loc[df['Name'].str.contains('an', na=False)] print(result)
Output
Name Age
0 Anna 23
3 Diana 29
4 Evan 40
Common Pitfalls
Common mistakes include:
- Not using
na=False, which causes errors if the column has missing values. - Forgetting that
str.contains()is case-sensitive by default. - Trying to use
==instead ofstr.contains()for substring matching.
Use case=False in str.contains() to ignore case.
python
import pandas as pd data = {'Name': ['Anna', None, 'Charlie', 'Diana', 'Evan'], 'Age': [23, 35, 45, 29, 40]} df = pd.DataFrame(data) # Wrong: causes error due to None # df.loc[df['Name'].str.contains('an')] # Right: handle missing values result = df.loc[df['Name'].str.contains('an', na=False)] print(result)
Output
Name Age
0 Anna 23
3 Diana 29
4 Evan 40
Quick Reference
| Usage | Description |
|---|---|
| df.loc[df['col'].str.contains('text', na=False)] | Select rows where 'col' contains 'text' |
| df.loc[df['col'].str.contains('text', case=False, na=False)] | Case-insensitive contains check |
| df.loc[df['col'].str.contains('^start', na=False)] | Rows where 'col' starts with 'start' (regex) |
| df.loc[df['col'].str.contains('end$', na=False)] | Rows where 'col' ends with 'end' (regex) |
Key Takeaways
Use df.loc with str.contains('substring', na=False) to filter rows by substring.
Always set na=False to avoid errors with missing values in the column.
str.contains() is case-sensitive by default; use case=False for ignoring case.
str.contains() supports regular expressions for advanced matching.
Avoid using == for substring checks; str.contains() is the correct method.