Using pipe() helps you clean data step-by-step in a clear and organized way. It makes your code easier to read and reuse.
Building cleaning pipelines with pipe() in Pandas
cleaned_data = (dataframe
.pipe(function1, arg1, arg2)
.pipe(function2)
.pipe(function3, arg3=value3)
)pipe() passes the DataFrame to the function as the first argument.
You can add extra arguments to the function after the DataFrame inside pipe().
pipe().def drop_missing(df): return df.dropna() cleaned = df.pipe(drop_missing)
pipe().def select_columns(df, cols): return df[cols] cleaned = df.pipe(select_columns, ['A', 'B'])
pipe().cleaned = (df
.pipe(drop_missing)
.pipe(select_columns, ['A', 'B'])
)This program creates a small dataset with missing values and extra columns. It then cleans the data by dropping rows with missing values, selecting only the 'Name' and 'Age' columns, and renaming them. The pipe() method makes the steps easy to read and follow.
import pandas as pd def drop_missing(df): return df.dropna() def select_columns(df, cols): return df[cols] def rename_columns(df, new_names): return df.rename(columns=new_names) # Sample data with missing values and extra columns data = pd.DataFrame({ 'Name': ['Alice', 'Bob', None, 'David'], 'Age': [25, None, 30, 22], 'City': ['NY', 'LA', 'SF', 'LA'], 'Score': [85, 90, 88, 92] }) # Cleaning pipeline using pipe() cleaned_data = (data .pipe(drop_missing) .pipe(select_columns, ['Name', 'Age']) .pipe(rename_columns, {'Name': 'Full Name', 'Age': 'Age Years'}) ) print(cleaned_data)
Each function used with pipe() should take a DataFrame as the first argument and return a DataFrame.
You can add as many cleaning steps as you want by chaining pipe() calls.
This method helps keep your data cleaning code modular and reusable.
pipe() helps you write clear, step-by-step data cleaning code.
You can chain multiple cleaning functions easily with pipe().
Functions used with pipe() should accept and return DataFrames.