Challenge - 5 Problems
Missing Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this code with missing data?
Consider the following pandas DataFrame with missing values. What will be the output of the code below?
Pandas
import pandas as pd import numpy as np df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, np.nan, 8]}) result = df['A'].mean() print(round(result, 2))
Attempts:
2 left
💡 Hint
Remember that pandas mean() skips missing values by default.
✗ Incorrect
The mean() function in pandas ignores NaN values by default. So it calculates mean of 1, 2, and 4 which is (1+2+4)/3 = 2.33.
❓ data_output
intermediate2:00remaining
How many rows remain after dropping missing data?
Given this DataFrame, how many rows remain after dropping all rows with any missing values?
Pandas
import pandas as pd import numpy as np df = pd.DataFrame({'X': [1, np.nan, 3, 4], 'Y': [np.nan, 2, 3, 4]}) df_clean = df.dropna() print(len(df_clean))
Attempts:
2 left
💡 Hint
dropna() removes rows with any NaN values.
✗ Incorrect
Only the last row has no missing values, so after dropna(), only 1 row remains.
🔧 Debug
advanced2:00remaining
What error does this code raise when filling missing data?
What error will this code produce when trying to fill missing values?
Pandas
import pandas as pd df = pd.DataFrame({'A': [1, 2, None], 'B': ['x', 'y', None]}) df['B'] = df['B'].fillna(0) print(df)
Attempts:
2 left
💡 Hint
fillna can fill missing values with any scalar, even if column is string type.
✗ Incorrect
fillna replaces None with 0 without error, even in string columns.
🧠 Conceptual
advanced2:00remaining
Why is handling missing data important before modeling?
Which of the following best explains why handling missing data is important before building a machine learning model?
Attempts:
2 left
💡 Hint
Think about how missing values affect calculations and predictions.
✗ Incorrect
Missing data can cause errors or bias in models, so it must be handled properly.
🚀 Application
expert3:00remaining
Which option produces the correct imputed DataFrame?
Given this DataFrame with missing values, which code snippet correctly fills missing numeric values with the column mean and missing categorical values with the mode?
Pandas
import pandas as pd import numpy as np df = pd.DataFrame({'num': [1, 2, np.nan, 4], 'cat': ['a', np.nan, 'a', 'b']})
Attempts:
2 left
💡 Hint
Use mean for numeric and mode for categorical separately.
✗ Incorrect
Option A correctly fills numeric NaNs with mean and categorical NaNs with mode. Others either misuse fillna or use wrong methods.