0
0
Pandasdata~20 mins

Missing data strategies decision in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Missing Data Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Choosing the best strategy for missing data in a dataset

You have a dataset with 10% missing values in a column that represents age. The missing values are random and not related to other variables. Which missing data strategy is most appropriate?

AFill missing age values with the mean age of the column.
BRemove all rows with missing age values.
CFill missing age values with zero.
DLeave the missing values as they are.
Attempts:
2 left
💡 Hint

Think about preserving as much data as possible and using a simple method for random missing values.

Predict Output
intermediate
2:00remaining
Output of filling missing values with forward fill

What is the output of the following code?

Pandas
import pandas as pd

data = {'score': [10, None, None, 20, None, 30]}
df = pd.DataFrame(data)
df_filled = df.fillna(method='ffill')
print(df_filled)
A
   score
0   10.0
1   10.0
2   10.0
3   20.0
4   20.0
5    NaN
B
   score
0   10.0
1    NaN
2    NaN
3   20.0
4    NaN
5   30.0
C
   score
0   10.0
1   10.0
2   10.0
3   20.0
4    NaN
5   30.0
D
   score
0   10.0
1   10.0
2   10.0
3   20.0
4   20.0
5   30.0
Attempts:
2 left
💡 Hint

Forward fill replaces missing values with the last known value above.

data_output
advanced
2:00remaining
Result of dropping rows with missing values

Given the DataFrame below, what is the result after dropping rows with any missing values?

Pandas
import pandas as pd

data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4]}
df = pd.DataFrame(data)
df_clean = df.dropna()
print(df_clean)
A
   A    B
1  2.0  2.0
3  4.0  4.0
BEmpty DataFrame with columns A and B and 0 rows
C
   A    B
0  1.0  NaN
1  2.0  2.0
2  NaN  3.0
3  4.0  4.0
D
   A    B
0  1.0  NaN
3  4.0  4.0
Attempts:
2 left
💡 Hint

dropna() removes rows with any missing value.

🔧 Debug
advanced
2:00remaining
Identify the error in filling missing values with median

What error does the following code raise?

Pandas
import pandas as pd

data = {'height': [150, 160, None, 170, None]}
df = pd.DataFrame(data)
df['height'] = df['height'].fillna(df['height'].median())
print(df)
ATypeError because median() returns a string.
BNo error, prints DataFrame with missing values filled by median.
CKeyError because 'height' column does not exist.
DAttributeError because fillna() is not a method of Series.
Attempts:
2 left
💡 Hint

Check if the methods and column names are correct.

🚀 Application
expert
3:00remaining
Choosing the best missing data strategy for time series data

You have a time series dataset with missing temperature readings at random times. You want to fill missing values to keep the time order intact and avoid introducing bias. Which strategy is best?

AFill missing values with the overall mean temperature.
BRemove all rows with missing temperature values.
CUse forward fill to propagate last known temperature forward.
DFill missing values with zero.
Attempts:
2 left
💡 Hint

Think about preserving time order and realistic values.