How to Use str.startswith in pandas for String Filtering
In pandas, use
Series.str.startswith() to check if each string in a column starts with a specific prefix. It returns a boolean Series that you can use to filter rows or analyze data based on string beginnings.Syntax
The str.startswith() method is used on a pandas Series containing strings. It checks if each string starts with the given prefix and returns a Series of True or False values.
prefix: The string or tuple of strings to check at the start.na: Optional boolean or value to fill missing values; default isNone.case: Optional boolean to specify case sensitivity; default isTrue.
python
Series.str.startswith(prefix, na=None, case=True)
Example
This example shows how to filter a DataFrame to keep only rows where the 'Name' column starts with 'A'.
python
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Anna', 'Mike', 'Amanda'], 'Age': [25, 30, 22, 32, 28]} df = pd.DataFrame(data) # Check which names start with 'A' starts_with_a = df['Name'].str.startswith('A') # Filter rows where 'Name' starts with 'A' filtered_df = df[starts_with_a] print(filtered_df)
Output
Name Age
0 Alice 25
2 Anna 22
4 Amanda 28
Common Pitfalls
Common mistakes include:
- Using
startswithon non-string columns without converting them first. - Ignoring case sensitivity when needed.
- Not handling missing (NaN) values, which can cause unexpected results.
Always ensure the column is string type and consider na and case parameters.
python
import pandas as pd data = {'Name': ['Alice', None, 'anna', 'Mike', 'Amanda'], 'Age': [25, 30, 22, 32, 28]} df = pd.DataFrame(data) # Wrong: No handling of None and case # This returns False for None and is case sensitive print(df['Name'].str.startswith('A')) # Right: Handle NaN and ignore case print(df['Name'].str.startswith('A', na=False, case=False))
Output
0 True
1 False
2 False
3 False
4 True
Name: Name, dtype: bool
0 True
1 False
2 True
3 False
4 True
Name: Name, dtype: bool
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| prefix | String or tuple of strings to check at start | Required |
| na | Value to fill for missing data (NaN) | None |
| case | Whether to consider case when matching | True |
Key Takeaways
Use Series.str.startswith(prefix) to get a boolean mask for strings starting with prefix.
Handle missing values with the na parameter to avoid errors or unexpected False results.
Set case=False to ignore letter case when checking prefixes.
Always ensure the column is string type before using str.startswith.
Use the boolean result to filter DataFrame rows easily.