Pandasdata~10 mins

Building cleaning pipelines with pipe() in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Building cleaning pipelines with pipe()

Start with raw DataFrame

↓

Define cleaning functions

↓

Apply pipe() with first function

↓

Apply pipe() with second function

↓

...

↓

Get cleaned DataFrame as output

Start with raw data, define cleaning steps as functions, then apply them step-by-step using pipe() to get a clean DataFrame.

Execution Sample

Pandas

import pandas as pd

def drop_missing(df):
    return df.dropna()

def to_lowercase(df):
    df['Name'] = df['Name'].str.lower()
    return df

raw = pd.DataFrame({'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]})
cleaned = raw.pipe(drop_missing).pipe(to_lowercase)

This code cleans a DataFrame by dropping rows with missing values, then converting the 'Name' column to lowercase using pipe().

Execution Table

Step	DataFrame State	Action	Resulting DataFrame
Start	{'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]}	Initial raw DataFrame	{'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]}
1	{'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]}	Apply drop_missing (drop rows with any NaN)	{'Name': ['Alice'], 'Age': [25]}
2	{'Name': ['Alice'], 'Age': [25]}	Apply to_lowercase (convert 'Name' to lowercase)	{'Name': ['alice'], 'Age': [25]}
End	{'Name': ['alice'], 'Age': [25]}	No more pipe steps	{'Name': ['alice'], 'Age': [25]}

💡 All pipe functions applied; final cleaned DataFrame obtained.

Variable Tracker

Variable	Start	After 1	After 2	Final
raw	{'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]}	{'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]}	{'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]}	{'Name': ['Alice', None, 'BOB'], 'Age': [25, 30, None]}
cleaned	N/A	{'Name': ['Alice'], 'Age': [25]}	{'Name': ['alice'], 'Age': [25]}	{'Name': ['alice'], 'Age': [25]}

Key Moments - 3 Insights

Why does the DataFrame lose rows after the first pipe step?

Does pipe() change the original DataFrame?

Why do we return the DataFrame inside each cleaning function?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the 'Name' column value after step 1?

A[None]

B['BOB']

C['Alice']

D['alice']

Concept Snapshot

Use pipe() to chain cleaning functions on DataFrames.
Each function takes a DataFrame and returns a DataFrame.
pipe() passes the DataFrame through each function in order.
This creates clear, readable cleaning pipelines.
Example: df.pipe(func1).pipe(func2)

Full Transcript

We start with a raw DataFrame containing some missing values and mixed case names. We define two cleaning functions: one to drop rows with missing data, and another to convert the 'Name' column to lowercase. Using pipe(), we apply these functions one after another. After the first pipe step, rows with missing values are removed. After the second, the names are all lowercase. The original DataFrame remains unchanged, and the cleaned DataFrame is the final output. This method helps build clear, step-by-step cleaning pipelines.