Challenge - 5 Problems
One-hot Encoding Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this one-hot encoding code?
Consider this Python code using pandas to one-hot encode a categorical column. What is the printed output?
ML Python
import pandas as pd data = {'color': ['red', 'blue', 'green', 'blue']} df = pd.DataFrame(data) dummies = pd.get_dummies(df['color']) print(dummies)
Attempts:
2 left
💡 Hint
Look at the order of columns created by pd.get_dummies and how the rows correspond to the original data.
✗ Incorrect
pd.get_dummies creates columns in sorted order of unique values. Here, 'blue', 'green', 'red' are sorted alphabetically. Each row has 1 in the column matching its color and 0 elsewhere.
🧠 Conceptual
intermediate1:30remaining
Why use one-hot encoding for categorical data?
Which of the following is the main reason to use one-hot encoding on categorical features before training a machine learning model?
Attempts:
2 left
💡 Hint
Think about how models interpret numeric inputs and what happens if categories are just assigned numbers.
✗ Incorrect
One-hot encoding converts categories into separate binary columns so models do not assume any order or priority among categories.
❓ Hyperparameter
advanced2:00remaining
Choosing the right approach for high-cardinality categorical features
You have a categorical feature with 10,000 unique values. Which one-hot encoding approach is best to avoid excessive memory use?
Attempts:
2 left
💡 Hint
Think about memory and model performance when many categories exist.
✗ Incorrect
Standard one-hot encoding creates too many columns, causing memory and performance issues. Target encoding reduces dimensionality by using target statistics.
❓ Metrics
advanced1:30remaining
Effect of one-hot encoding on model accuracy
You train two models on the same dataset: Model A uses label encoding for a categorical feature, Model B uses one-hot encoding. Model B shows better accuracy. Why?
Attempts:
2 left
💡 Hint
Consider how models interpret numeric values from label encoding.
✗ Incorrect
Label encoding assigns arbitrary numeric values that may imply order, misleading the model. One-hot encoding avoids this by treating categories independently.
🔧 Debug
expert2:30remaining
Why does this one-hot encoding code raise an error?
What error does this code raise and why?
import numpy as np
categories = ['cat', 'dog', 'bird']
values = ['cat', 'dog', 'fish']
one_hot = np.zeros((len(values), len(categories)))
for i, val in enumerate(values):
idx = categories.index(val)
one_hot[i, idx] = 1
print(one_hot)
Attempts:
2 left
💡 Hint
Check if all values exist in the categories list before indexing.
✗ Incorrect
The code tries to find the index of 'fish' in categories, but 'fish' is not present, causing a ValueError.