0
0
Pandasdata~30 mins

Missing data strategies decision in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Missing Data Strategies Decision
📖 Scenario: You are working as a data analyst for a small online store. You have collected customer data, but some values are missing. You need to decide how to handle these missing values before analysis.
🎯 Goal: Build a simple program that loads a dataset with missing values, sets a threshold for acceptable missing data, applies a strategy to handle missing values, and shows the cleaned data.
📋 What You'll Learn
Create a pandas DataFrame with missing values
Set a threshold variable for maximum allowed missing values per column
Use a method to drop columns exceeding the threshold
Fill remaining missing values with a fixed value
Print the cleaned DataFrame
💡 Why This Matters
🌍 Real World
Handling missing data is a common task in data science to prepare datasets for analysis or machine learning.
💼 Career
Data analysts and scientists must decide how to handle missing data to ensure accurate and reliable results.
Progress0 / 4 steps
1
Create a DataFrame with missing values
Create a pandas DataFrame called df with these exact columns and values:
'CustomerID': [101, 102, 103, 104, 105],
'Age': [25, None, 22, None, 28],
'City': ['New York', 'Los Angeles', None, 'Chicago', 'Houston'],
'PurchaseAmount': [250.0, 300.0, None, 150.0, None].
Pandas
Need a hint?

Use pd.DataFrame with a dictionary where keys are column names and values are lists including None for missing data.

2
Set a threshold for missing data
Create a variable called max_missing and set it to 2. This will be the maximum number of missing values allowed per column.
Pandas
Need a hint?

Just create a variable max_missing and assign the number 2.

3
Drop columns with too many missing values and fill remaining missing values
Create a new DataFrame called cleaned_df by dropping columns from df that have more than max_missing missing values. Then fill the remaining missing values in cleaned_df with 0.
Pandas
Need a hint?

Use dropna with axis=1 and thresh=len(df) - max_missing to drop columns. Then use fillna(0) to fill missing values.

4
Print the cleaned DataFrame
Print the cleaned_df DataFrame to see the result after handling missing data.
Pandas
Need a hint?

Use print(cleaned_df) to show the cleaned data.