How to select rows where column contains in pandas

PandasHow-ToBeginner · 3 min read

Select Rows Where Column Contains in pandas: Simple Guide

Use the DataFrame.loc with column.str.contains('substring') to select rows where a column contains a specific substring. This filters rows by checking if the substring exists anywhere in the column's text values.

📐

Syntax

Use df.loc[df['column'].str.contains('substring', na=False)] to select rows where the column contains the substring. Here:

df is your DataFrame.
column is the column name to check.
str.contains() checks if the substring is in each value.
na=False avoids errors with missing values.

python

df.loc[df['column'].str.contains('substring', na=False)]

💻

Example

This example shows how to select rows where the 'Name' column contains the substring 'an'.

python

import pandas as pd

data = {'Name': ['Anna', 'Bob', 'Charlie', 'Diana', 'Evan'], 'Age': [23, 35, 45, 29, 40]}
df = pd.DataFrame(data)

result = df.loc[df['Name'].str.contains('an', na=False)]
print(result)

Output

Name Age 0 Anna 23 3 Diana 29 4 Evan 40

⚠️

Common Pitfalls

Common mistakes include:

Not using na=False, which causes errors if the column has missing values.
Forgetting that str.contains() is case-sensitive by default.
Trying to use == instead of str.contains() for substring matching.

Use case=False in str.contains() to ignore case.

python

import pandas as pd

data = {'Name': ['Anna', None, 'Charlie', 'Diana', 'Evan'], 'Age': [23, 35, 45, 29, 40]}
df = pd.DataFrame(data)

# Wrong: causes error due to None
# df.loc[df['Name'].str.contains('an')]

# Right: handle missing values
result = df.loc[df['Name'].str.contains('an', na=False)]
print(result)

Output

Name Age 0 Anna 23 3 Diana 29 4 Evan 40

📊

Quick Reference

Usage	Description
df.loc[df['col'].str.contains('text', na=False)]	Select rows where 'col' contains 'text'
df.loc[df['col'].str.contains('text', case=False, na=False)]	Case-insensitive contains check
df.loc[df['col'].str.contains('^start', na=False)]	Rows where 'col' starts with 'start' (regex)
df.loc[df['col'].str.contains('end$', na=False)]	Rows where 'col' ends with 'end' (regex)

✅

Key Takeaways

Use df.loc with str.contains('substring', na=False) to filter rows by substring.

Always set na=False to avoid errors with missing values in the column.

str.contains() is case-sensitive by default; use case=False for ignoring case.

str.contains() supports regular expressions for advanced matching.

Avoid using == for substring checks; str.contains() is the correct method.