0
0
Pandasdata~5 mins

Combining multiple cleaning steps in Pandas

Choose your learning style9 modes available
Introduction

Cleaning data often needs many small fixes. Combining steps helps fix data all at once, saving time and making code neat.

You have a messy table with missing values and wrong formats.
You want to fix text and numbers in one go before analysis.
You need to remove duplicates and fill blanks together.
You want to prepare data quickly for a report or chart.
Syntax
Pandas
df = (df
      .dropna()
      .rename(columns={'old': 'new'})
      .astype({'col': 'int'})
      .assign(new_col=lambda x: x['col'] * 2))

Use parentheses to chain methods for clarity.

Each step returns a new DataFrame, so you can chain many.

Examples
Remove duplicate rows, then fill missing values with zero.
Pandas
df = (df
      .drop_duplicates()
      .fillna(0))
Change column name and convert Age to integer type.
Pandas
df = (df
      .rename(columns={'Name': 'FullName'})
      .astype({'Age': 'int'}))
Add a new column with squared scores, then drop rows missing Score.
Pandas
df = (df
      .assign(ScoreSquared=lambda x: x['Score'] ** 2)
      .dropna(subset=['Score']))
Sample Program

This code cleans the data by removing duplicates, dropping rows missing important info, fixing data types, filling missing scores, and adding a new calculated column.

Pandas
import pandas as pd

# Create sample data
raw_data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Bob', None],
    'Age': ['25', '30', None, '30', '22'],
    'Score': [85, 90, 78, 90, None]
}
df = pd.DataFrame(raw_data)

# Combine cleaning steps
clean_df = (df
            .drop_duplicates()  # Remove duplicate rows
            .dropna(subset=['Name', 'Age'])  # Drop rows missing Name or Age
            .astype({'Age': 'int'})  # Convert Age to integer
            .fillna({'Score': 0})  # Fill missing Score with 0
            .assign(ScoreSquared=lambda x: x['Score'] ** 2))  # Add squared Score

print(clean_df)
OutputSuccess
Important Notes

Order matters: dropping missing values before converting types avoids errors.

Use assign() to add new columns based on existing data.

Chaining keeps code clean and easy to read.

Summary

Combine cleaning steps by chaining pandas methods.

Each step fixes one problem, together they clean data fully.

Use parentheses and line breaks for neat, readable code.