Challenge - 5 Problems
One-hot Encoding Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of One-hot Encoding with pandas get_dummies
What is the output DataFrame after applying
pd.get_dummies to the 'Color' column?Data Analysis Python
import pandas as pd df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue']}) dummies = pd.get_dummies(df['Color']) print(dummies)
Attempts:
2 left
💡 Hint
Check the alphabetical order of the columns created by get_dummies.
✗ Incorrect
pandas.get_dummies creates columns in alphabetical order of unique values. Here, 'Blue', 'Green', 'Red' columns appear in that order.
❓ data_output
intermediate1:30remaining
Number of Columns After One-hot Encoding
Given a DataFrame with a column 'Fruit' containing ['Apple', 'Banana', 'Apple', 'Orange', 'Banana'], how many columns will the one-hot encoded DataFrame have for the 'Fruit' column?
Data Analysis Python
import pandas as pd df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana']}) dummies = pd.get_dummies(df['Fruit']) print(dummies.shape[1])
Attempts:
2 left
💡 Hint
Count unique values in the 'Fruit' column.
✗ Incorrect
One-hot encoding creates one column per unique category. There are 3 unique fruits: Apple, Banana, Orange.
🔧 Debug
advanced2:00remaining
Identify the Error in One-hot Encoding Code
What error will this code raise when trying to one-hot encode the 'Category' column?
Data Analysis Python
import pandas as pd df = pd.DataFrame({'Category': ['A', 'B', 'C']}) encoded = pd.get_dummies(df.Category, drop_first=True) print(encoded)
Attempts:
2 left
💡 Hint
Check the parentheses in the function call.
✗ Incorrect
The code is missing a closing parenthesis in the get_dummies call, causing a SyntaxError.
🚀 Application
advanced1:30remaining
Choosing One-hot Encoding for Machine Learning
Why is one-hot encoding preferred over label encoding for categorical variables in many machine learning models?
Attempts:
2 left
💡 Hint
Think about how models interpret numbers assigned to categories.
✗ Incorrect
Label encoding assigns numbers that can imply order, which may mislead models. One-hot encoding treats categories independently.
🧠 Conceptual
expert2:00remaining
Effect of One-hot Encoding on Data Dimensionality
What is a common drawback of using one-hot encoding on a categorical feature with very high cardinality (many unique values)?
Attempts:
2 left
💡 Hint
Think about how many new columns are created for many unique categories.
✗ Incorrect
One-hot encoding creates one column per unique category, so many categories create many columns, leading to sparse data and higher resource use.