0
0
Pandasdata~30 mins

Building cleaning pipelines with pipe() in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Building cleaning pipelines with pipe()
📖 Scenario: You work as a data analyst for a small online store. You receive raw sales data that needs cleaning before analysis. The data has some missing values and inconsistent formatting.Using pandas, you will clean this data step-by-step by building a pipeline with the pipe() method. This method helps you apply multiple cleaning functions in a clear and organized way.
🎯 Goal: Create a pandas DataFrame with raw sales data, define cleaning functions, and use pipe() to apply these functions in a pipeline. Finally, display the cleaned DataFrame.
📋 What You'll Learn
Create a pandas DataFrame with given sales data
Define a function to fill missing values
Define a function to standardize product names
Use pipe() to apply cleaning functions in sequence
Print the cleaned DataFrame
💡 Why This Matters
🌍 Real World
Cleaning data is a common first step in data science projects. Using pipelines with <code>pipe()</code> helps keep your code organized and easy to read.
💼 Career
Data analysts and scientists often clean messy data before analysis. Knowing how to build pipelines with <code>pipe()</code> is a valuable skill for writing clean, maintainable code.
Progress0 / 4 steps
1
Create the raw sales DataFrame
Create a pandas DataFrame called sales with these exact columns and data:
Product: ['apple', 'Banana', None, 'orange', 'banana']
Quantity: [10, 5, 8, None, 7]
Price: [1.2, 0.5, 0.8, 1.0, None]
Pandas
Need a hint?

Use pd.DataFrame with a dictionary of lists for columns.

2
Define cleaning functions
Define two functions:
1. fill_missing(df) that fills missing values in Quantity and Price columns with 0.
2. standardize_product(df) that converts all Product names to lowercase strings and fills missing Product values with 'unknown'.
Pandas
Need a hint?

Use fillna(0) for missing numbers and fillna('unknown') plus str.lower() for product names.

3
Build the cleaning pipeline with pipe()
Use the pipe() method on sales to apply fill_missing first, then standardize_product. Save the result in a new variable called cleaned_sales.
Pandas
Need a hint?

Chain pipe() calls: sales.pipe(fill_missing).pipe(standardize_product)

4
Display the cleaned DataFrame
Print the cleaned_sales DataFrame to see the cleaned data.
Pandas
Need a hint?

Use print(cleaned_sales) to show the cleaned DataFrame.