Challenge - 5 Problems
Encoding Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of One-Hot Encoding with pandas
What is the output DataFrame after applying one-hot encoding to the 'Color' column using pandas get_dummies?
Data Analysis Python
import pandas as pd df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue']}) df_encoded = pd.get_dummies(df, columns=['Color']) print(df_encoded)
Attempts:
2 left
💡 Hint
Remember that get_dummies creates a new column for each category with 1 where the row matches that category.
✗ Incorrect
The get_dummies function creates columns named with the original column name plus the category. Each row has 1 in the column matching its category and 0 elsewhere.
❓ data_output
intermediate2:00remaining
Label Encoding Result
What is the array output after label encoding the 'Fruit' list using sklearn's LabelEncoder?
Data Analysis Python
from sklearn.preprocessing import LabelEncoder fruits = ['apple', 'banana', 'apple', 'orange', 'banana'] encoder = LabelEncoder() encoded = encoder.fit_transform(fruits) print(encoded)
Attempts:
2 left
💡 Hint
LabelEncoder assigns integers starting from 0 in alphabetical order of categories.
✗ Incorrect
The categories sorted alphabetically are 'apple', 'banana', 'orange'. They get labels 0, 1, 2 respectively.
🔧 Debug
advanced2:00remaining
Error in One-Hot Encoding with Unknown Categories
What error will this code raise when transforming new data with unseen categories using sklearn's OneHotEncoder?
Data Analysis Python
from sklearn.preprocessing import OneHotEncoder encoder = OneHotEncoder(handle_unknown='error') encoder.fit([['cat'], ['dog']]) encoder.transform([['cat'], ['bird']])
Attempts:
2 left
💡 Hint
By default, OneHotEncoder raises an error if it sees categories not seen during fit.
✗ Incorrect
The parameter handle_unknown='error' causes a ValueError when transform sees unseen categories like 'bird'.
🚀 Application
advanced2:00remaining
Choosing Encoding for High Cardinality Feature
You have a categorical feature with 10,000 unique values. Which encoding method is best to reduce memory and avoid too many columns?
Attempts:
2 left
💡 Hint
One-hot encoding creates one column per category, which is large here.
✗ Incorrect
Binary encoding reduces dimensionality by encoding categories as binary digits, using fewer columns than one-hot.
🧠 Conceptual
expert2:00remaining
Effect of Label Encoding on Tree-Based Models
Why can label encoding categorical variables be problematic for linear models but usually acceptable for tree-based models?
Attempts:
2 left
💡 Hint
Think about how models interpret numeric values of categories.
✗ Incorrect
Linear models treat encoded labels as ordered numbers, which can mislead them if categories have no order. Tree models split data based on thresholds and do not assume order.