Complete the code to calculate the mean target value for each category.
mean_target = df.groupby('category')['target'].[1]()
The mean() function calculates the average target value for each category, which is the basis of target encoding.
Complete the code to map the target mean to the original dataframe's category column.
df['category_encoded'] = df['category'].[1](mean_target)
The map() function replaces each category with its corresponding mean target value from mean_target.
Fix the error in the code to avoid data leakage by computing target encoding only on the training set.
mean_target_train = train_df.groupby('category')['target'].[1]() train_df['category_encoded'] = train_df['category'].map(mean_target_train) test_df['category_encoded'] = test_df['category'].map(mean_target_train).fillna([2])
We use mean to calculate the average target in the training set. For categories in the test set not seen in training, we fill missing values with 0 to avoid errors.
Fill both blanks to create a target encoding function that fits on training data and transforms new data.
def target_encode(train, test, cat_col, target_col): mean_target = train.groupby([1])[[2]].mean() train_encoded = train[cat_col].map(mean_target) test_encoded = test[cat_col].map(mean_target).fillna(mean_target.mean()) return train_encoded, test_encoded
The function groups training data by the category column and calculates the mean of the target column. These are the inputs to groupby and the column to average.
Fill all three blanks to apply target encoding with smoothing to reduce noise for categories with few samples.
def smooth_target_encode(train, test, cat_col, target_col, m=5): mean = train[target_col].mean() agg = train.groupby([1])[[2]].agg(['mean', 'count']) smooth = (agg['count'] * agg['mean'] + m * mean) / (agg['count'] + m) train_encoded = train[cat_col].map(smooth) test_encoded = test[cat_col].map(smooth).fillna(mean) return train_encoded, test_encoded
The groupby is done on the category column ('category'), and the aggregation is on the target column ('target'). The smoothing formula uses counts and means to balance category averages with the global mean.