Data often has mistakes or different ways to write the same thing. Fixing these helps us get clear and correct results.
0
0
Handling inconsistent values in Pandas
Introduction
When you have a list of countries but some are written differently like 'USA' and 'United States'.
When survey answers have typos or mixed cases like 'Yes', 'yes', and 'YES'.
When dates or categories are written in different formats in your data.
When you want to group or analyze data but inconsistent values cause wrong groups.
When merging data from different sources that use different naming styles.
Syntax
Pandas
df['column'] = df['column'].str.lower() df['column'] = df['column'].str.strip() df['column'] = df['column'].replace({'old_value': 'new_value'})
Use str.lower() or str.upper() to make text consistent in case.
Use str.strip() to remove unwanted spaces before or after text.
Use replace() to fix specific wrong or inconsistent values.
Examples
This changes all color names to lowercase to avoid case differences.
Pandas
df['color'] = df['color'].str.lower()
This replaces different ways of writing USA with a single standard name.
Pandas
df['country'] = df['country'].replace({'USA': 'United States', 'U.S.A.': 'United States'})
This removes extra spaces before or after names.
Pandas
df['name'] = df['name'].str.strip()
Sample Program
This code fixes inconsistent country names by making them lowercase, removing spaces, and replacing variations of USA with 'united states'.
Pandas
import pandas as pd data = {'Country': ['USA', 'usa ', 'U.S.A.', 'Canada', 'canada', 'CANADA ']} df = pd.DataFrame(data) # Make all country names lowercase df['Country'] = df['Country'].str.lower() # Remove extra spaces df['Country'] = df['Country'].str.strip() # Replace different USA forms with 'united states' df['Country'] = df['Country'].replace({'usa': 'united states', 'u.s.a.': 'united states'}) print(df)
OutputSuccess
Important Notes
Always check your data first to see what inconsistencies exist.
Use str.strip() to remove unwanted spaces that can cause mismatches.
Replacing values helps unify different spellings or abbreviations.
Summary
Inconsistent values can cause wrong analysis results.
Use string methods like lower(), upper(), and strip() to clean text data.
Use replace() to fix specific inconsistent values.