0
0
PandasHow-ToBeginner · 3 min read

Select Rows Where Column Contains in pandas: Simple Guide

Use the DataFrame.loc with column.str.contains('substring') to select rows where a column contains a specific substring. This filters rows by checking if the substring exists anywhere in the column's text values.
📐

Syntax

Use df.loc[df['column'].str.contains('substring', na=False)] to select rows where the column contains the substring. Here:

  • df is your DataFrame.
  • column is the column name to check.
  • str.contains() checks if the substring is in each value.
  • na=False avoids errors with missing values.
python
df.loc[df['column'].str.contains('substring', na=False)]
💻

Example

This example shows how to select rows where the 'Name' column contains the substring 'an'.

python
import pandas as pd

data = {'Name': ['Anna', 'Bob', 'Charlie', 'Diana', 'Evan'], 'Age': [23, 35, 45, 29, 40]}
df = pd.DataFrame(data)

result = df.loc[df['Name'].str.contains('an', na=False)]
print(result)
Output
Name Age 0 Anna 23 3 Diana 29 4 Evan 40
⚠️

Common Pitfalls

Common mistakes include:

  • Not using na=False, which causes errors if the column has missing values.
  • Forgetting that str.contains() is case-sensitive by default.
  • Trying to use == instead of str.contains() for substring matching.

Use case=False in str.contains() to ignore case.

python
import pandas as pd

data = {'Name': ['Anna', None, 'Charlie', 'Diana', 'Evan'], 'Age': [23, 35, 45, 29, 40]}
df = pd.DataFrame(data)

# Wrong: causes error due to None
# df.loc[df['Name'].str.contains('an')]

# Right: handle missing values
result = df.loc[df['Name'].str.contains('an', na=False)]
print(result)
Output
Name Age 0 Anna 23 3 Diana 29 4 Evan 40
📊

Quick Reference

UsageDescription
df.loc[df['col'].str.contains('text', na=False)]Select rows where 'col' contains 'text'
df.loc[df['col'].str.contains('text', case=False, na=False)]Case-insensitive contains check
df.loc[df['col'].str.contains('^start', na=False)]Rows where 'col' starts with 'start' (regex)
df.loc[df['col'].str.contains('end$', na=False)]Rows where 'col' ends with 'end' (regex)

Key Takeaways

Use df.loc with str.contains('substring', na=False) to filter rows by substring.
Always set na=False to avoid errors with missing values in the column.
str.contains() is case-sensitive by default; use case=False for ignoring case.
str.contains() supports regular expressions for advanced matching.
Avoid using == for substring checks; str.contains() is the correct method.