You have a dataset with 10% missing values in a column that represents age. The missing values are random and not related to other variables. Which missing data strategy is most appropriate?
Think about preserving as much data as possible and using a simple method for random missing values.
Since missing values are random and only 10%, filling with the mean preserves data and avoids bias. Removing rows loses data unnecessarily. Filling with zero is misleading. Leaving missing values can cause errors in analysis.
What is the output of the following code?
import pandas as pd data = {'score': [10, None, None, 20, None, 30]} df = pd.DataFrame(data) df_filled = df.fillna(method='ffill') print(df_filled)
Forward fill replaces missing values with the last known value above.
Forward fill replaces each missing value with the previous non-missing value. So the None values after 10 become 10, after 20 become 20, and so on.
Given the DataFrame below, what is the result after dropping rows with any missing values?
import pandas as pd data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4]} df = pd.DataFrame(data) df_clean = df.dropna() print(df_clean)
dropna() removes rows with any missing value.
Rows 0 and 2 have missing values, so they are dropped. Rows 1 and 3 have no missing values and remain.
What error does the following code raise?
import pandas as pd data = {'height': [150, 160, None, 170, None]} df = pd.DataFrame(data) df['height'] = df['height'].fillna(df['height'].median()) print(df)
Check if the methods and column names are correct.
The code correctly fills missing values with the median of the 'height' column. No error occurs.
You have a time series dataset with missing temperature readings at random times. You want to fill missing values to keep the time order intact and avoid introducing bias. Which strategy is best?
Think about preserving time order and realistic values.
Forward fill keeps the time order and uses the last known value, which is common for time series. Mean or zero can distort trends. Removing rows loses data and breaks continuity.