0
0
Pandasdata~20 mins

Combining multiple cleaning steps in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Master of Combining Multiple Cleaning Steps
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of combined cleaning steps on a DataFrame

What is the output of the following code after applying multiple cleaning steps?

Pandas
import pandas as pd

data = {'Name': ['Alice', 'Bob', None, 'David'],
        'Age': [25, None, 30, 22],
        'City': ['New York', 'Los Angeles', 'New York', None]}
df = pd.DataFrame(data)

# Cleaning steps
result = (df.dropna(subset=['Name'])
            .fillna({'Age': df['Age'].mean(), 'City': 'Unknown'})
            .reset_index(drop=True))
print(result)
A
    Name   Age         City
0  Alice  25.0     New York
1    Bob  27.0       Los Angeles
2  David  22.0      Unknown
B
    Name   Age         City
0  Alice  25.0     New York
1    Bob  25.666667  Los Angeles
2  David  22.0      New York
C
    Name   Age         City
0  Alice  25.0     New York
1    Bob  NaN       Los Angeles
2  David  22.0      Unknown
D
    Name   Age         City
0  Alice  25.0     New York
1    Bob  25.666667  Los Angeles
2  David  22.0      Unknown
Attempts:
2 left
💡 Hint

Look carefully at how missing values are handled and what the mean age is.

data_output
intermediate
1:30remaining
Number of rows after multiple cleaning steps

After applying the cleaning steps below, how many rows remain in the DataFrame?

Pandas
import pandas as pd

data = {'Product': ['A', 'B', 'C', 'D', None],
        'Price': [10, None, 15, 20, 25],
        'Stock': [100, 200, None, 150, 300]}
df = pd.DataFrame(data)

cleaned = (df.dropna(subset=['Product'])
             .fillna({'Price': 0, 'Stock': 0})
             .query('Price > 0 and Stock > 0'))
print(len(cleaned))
A3
B2
C4
D5
Attempts:
2 left
💡 Hint

Check which rows are dropped and which remain after filtering.

🔧 Debug
advanced
1:30remaining
Identify the error in combined cleaning steps

What error does the following code raise?

Pandas
import pandas as pd

data = {'A': [1, 2, None], 'B': [4, None, 6]}
df = pd.DataFrame(data)

result = df.bfill().dropna(subset=['C'])
AKeyError: 'C'
BTypeError: fillna() got an unexpected keyword argument 'method'
CAttributeError: 'DataFrame' object has no attribute 'backfill'
DNo error, returns DataFrame
Attempts:
2 left
💡 Hint

Check if the column 'C' exists before dropping rows based on it.

visualization
advanced
2:00remaining
Resulting plot after cleaning and grouping

After cleaning and grouping the data, what does the bar plot show?

Pandas
import pandas as pd
import matplotlib.pyplot as plt

data = {'Category': ['X', 'Y', 'X', 'Z', None],
        'Value': [10, 20, None, 15, 5]}
df = pd.DataFrame(data)

cleaned = df.dropna(subset=['Category']).fillna({'Value': 0})
grouped = cleaned.groupby('Category').sum()

ax = grouped.plot(kind='bar', legend=False)
plt.ylabel('Sum of Values')
plt.title('Sum of Values by Category')
plt.close()  # Prevent actual plot display in test

print(grouped)
A{'Value': {'X': 10.0, 'Y': 20.0, 'Z': 15.0}}
B{'Value': {'X': 10.0, 'Y': 20.0, 'Z': 0.0}}
C{'Value': {'X': 0.0, 'Y': 20.0, 'Z': 15.0}}
D{'Value': {'X': 10.0, 'Y': 0.0, 'Z': 15.0}}
Attempts:
2 left
💡 Hint

Look at how missing values in 'Value' are replaced and how grouping sums values.

🚀 Application
expert
2:30remaining
Combining multiple cleaning steps for a real dataset

You have a DataFrame with columns 'ID', 'Score', and 'Grade'. You want to:

  • Remove rows where 'ID' is missing.
  • Fill missing 'Score' with the median score.
  • Replace missing 'Grade' with 'Incomplete'.
  • Drop rows where 'Score' is less than 50 after filling.

Which code snippet correctly performs all these steps?

Adf.fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'}).dropna(subset=['ID']).query('Score >= 50')
Bdf.query('Score >= 50').dropna(subset=['ID']).fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'})
Cdf.dropna(subset=['ID']).fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'}).query('Score >= 50')
Ddf.dropna(subset=['ID']).query('Score >= 50').fillna({'Score': df['Score'].median(), 'Grade': 'Incomplete'})
Attempts:
2 left
💡 Hint

Think about the order of operations: drop missing IDs first, then fill missing values, then filter by score.