0
0
Pandasdata~20 mins

Why handling missing data matters in Pandas - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Missing Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this code with missing data?
Consider the following pandas DataFrame with missing values. What will be the output of the code below?
Pandas
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5, np.nan, np.nan, 8]})
result = df['A'].mean()
print(round(result, 2))
A2.25
B2.33
Cnan
D3.0
Attempts:
2 left
💡 Hint
Remember that pandas mean() skips missing values by default.
data_output
intermediate
2:00remaining
How many rows remain after dropping missing data?
Given this DataFrame, how many rows remain after dropping all rows with any missing values?
Pandas
import pandas as pd
import numpy as np
df = pd.DataFrame({'X': [1, np.nan, 3, 4], 'Y': [np.nan, 2, 3, 4]})
df_clean = df.dropna()
print(len(df_clean))
A1
B2
C3
D4
Attempts:
2 left
💡 Hint
dropna() removes rows with any NaN values.
🔧 Debug
advanced
2:00remaining
What error does this code raise when filling missing data?
What error will this code produce when trying to fill missing values?
Pandas
import pandas as pd
df = pd.DataFrame({'A': [1, 2, None], 'B': ['x', 'y', None]})
df['B'] = df['B'].fillna(0)
print(df)
ATypeError
BKeyError
CValueError
DNo error, fills missing with 0
Attempts:
2 left
💡 Hint
fillna can fill missing values with any scalar, even if column is string type.
🧠 Conceptual
advanced
2:00remaining
Why is handling missing data important before modeling?
Which of the following best explains why handling missing data is important before building a machine learning model?
AModels ignore missing data automatically, so no need to handle it.
BMissing data always improves model accuracy.
CMissing data can cause models to crash or produce biased results.
DHandling missing data slows down model training without benefits.
Attempts:
2 left
💡 Hint
Think about how missing values affect calculations and predictions.
🚀 Application
expert
3:00remaining
Which option produces the correct imputed DataFrame?
Given this DataFrame with missing values, which code snippet correctly fills missing numeric values with the column mean and missing categorical values with the mode?
Pandas
import pandas as pd
import numpy as np
df = pd.DataFrame({'num': [1, 2, np.nan, 4], 'cat': ['a', np.nan, 'a', 'b']})
A
df['num'].fillna(df['num'].mean(), inplace=True)
df['cat'].fillna(df['cat'].mode()[0], inplace=True)
print(df)
B
df.fillna(df.mean(), inplace=True)
df.fillna(df.mode().iloc[0], inplace=True)
print(df)
C
df['num'] = df['num'].fillna(df['num'].median())
df['cat'] = df['cat'].fillna(df['cat'].mode()[0])
print(df)
D
df.fillna(method='ffill', inplace=True)
print(df)
Attempts:
2 left
💡 Hint
Use mean for numeric and mode for categorical separately.