Pandasdata~10 mins

Standardizing column names in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Standardizing column names

Start with DataFrame

↓

Access column names

↓

Apply standardization steps

↓

Update DataFrame columns

↓

Use standardized columns for analysis

We start with a DataFrame, get its column names, change them to a standard format, and then update the DataFrame with these new names.

Execution Sample

Pandas

import pandas as pd

df = pd.DataFrame({
    'First Name': ['Alice', 'Bob'],
    'AGE ': [25, 30],
    'eMail': ['a@example.com', 'b@example.com']
})

# Standardize columns
cols = df.columns.str.strip().str.lower().str.replace(' ', '_')
df.columns = cols
print(df.columns)

This code creates a DataFrame with messy column names, then standardizes them by stripping spaces, converting to lowercase, and replacing spaces with underscores.

Execution Table

Step	Action	Original Columns	After strip()	After lower()	After replace(' ', '_')	Final Columns
1	Start with DataFrame columns	['First Name', 'AGE ', 'eMail']	['First Name', 'AGE', 'eMail']	['first name', 'age', 'email']	['first_name', 'age', 'email']	['first_name', 'age', 'email']
2	Assign standardized columns to df.columns	N/A	N/A	N/A	N/A	['first_name', 'age', 'email']
3	Print df.columns	N/A	N/A	N/A	N/A	Index(['first_name', 'age', 'email'], dtype='object')

💡 All columns standardized and assigned back to DataFrame

Variable Tracker

Variable	Start	After strip()	After lower()	After replace(' ', '_')	Final
df.columns	['First Name', 'AGE ', 'eMail']	['First Name', 'AGE', 'eMail']	['first name', 'age', 'email']	['first_name', 'age', 'email']	['first_name', 'age', 'email']

Key Moments - 3 Insights

Why do we use str.strip() before str.lower()?

What does str.replace(' ', '_') do to the column names?

Why do we assign the new column names back to df.columns?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the column name for 'AGE ' after applying str.strip()?

A'AGE'

B'age '

C'AGE '

D'age'

Concept Snapshot

Standardizing column names:
- Access columns with df.columns
- Use str.strip() to remove spaces
- Use str.lower() to lowercase all
- Use str.replace(' ', '_') to replace spaces
- Assign back to df.columns
- Makes columns consistent and code-friendly

Full Transcript

We start with a DataFrame that has column names with spaces, uppercase letters, and inconsistent formatting. We access the columns using df.columns. Then, we apply string methods step-by-step: first strip() removes extra spaces, lower() converts all letters to lowercase, and replace(' ', '_') changes spaces to underscores. After these transformations, we assign the new list of column names back to df.columns to update the DataFrame. This process ensures the column names are clean and consistent for easier data analysis.