0
0
Data Analysis Pythondata~20 mins

One-hot encoding in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
One-hot Encoding Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of One-hot Encoding with pandas get_dummies
What is the output DataFrame after applying pd.get_dummies to the 'Color' column?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue']})
dummies = pd.get_dummies(df['Color'])
print(dummies)
A
   Blue  Green  Red
0     0      0    1
1     1      0    0
2     0      1    0
3     1      0    0
B
   Red  Blue  Green
0    1     0      0
1    0     1      0
2    0     0      1
3    0     1      0
C
   Blue  Green  Red
0     1      0    0
1     0      1    0
2     0      0    1
3     0      1    0
D
   Green  Blue  Red
0      0     1    0
1      1     0    0
2      0     0    1
3      1     0    0
Attempts:
2 left
💡 Hint
Check the alphabetical order of the columns created by get_dummies.
data_output
intermediate
1:30remaining
Number of Columns After One-hot Encoding
Given a DataFrame with a column 'Fruit' containing ['Apple', 'Banana', 'Apple', 'Orange', 'Banana'], how many columns will the one-hot encoded DataFrame have for the 'Fruit' column?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana']})
dummies = pd.get_dummies(df['Fruit'])
print(dummies.shape[1])
A4
B5
C3
D2
Attempts:
2 left
💡 Hint
Count unique values in the 'Fruit' column.
🔧 Debug
advanced
2:00remaining
Identify the Error in One-hot Encoding Code
What error will this code raise when trying to one-hot encode the 'Category' column?
Data Analysis Python
import pandas as pd

df = pd.DataFrame({'Category': ['A', 'B', 'C']})
encoded = pd.get_dummies(df.Category, drop_first=True)
print(encoded)
ATypeError: get_dummies() got an unexpected keyword argument 'drop_firsts'
BNo error, prints the encoded DataFrame
CNameError: name 'encoded' is not defined
DSyntaxError: unexpected EOF while parsing
Attempts:
2 left
💡 Hint
Check the parentheses in the function call.
🚀 Application
advanced
1:30remaining
Choosing One-hot Encoding for Machine Learning
Why is one-hot encoding preferred over label encoding for categorical variables in many machine learning models?
ABecause one-hot encoding avoids implying an order or priority among categories.
BBecause one-hot encoding reduces the number of features compared to label encoding.
CBecause label encoding creates multiple columns which confuse models.
DBecause label encoding is only used for numerical data, not categorical.
Attempts:
2 left
💡 Hint
Think about how models interpret numbers assigned to categories.
🧠 Conceptual
expert
2:00remaining
Effect of One-hot Encoding on Data Dimensionality
What is a common drawback of using one-hot encoding on a categorical feature with very high cardinality (many unique values)?
AIt converts numerical features into categorical ones, which is undesirable.
BIt causes the model to treat all categories as the same, reducing accuracy.
CIt automatically removes rare categories, losing important information.
DIt can cause the dataset to become very large and sparse, increasing memory and computation needs.
Attempts:
2 left
💡 Hint
Think about how many new columns are created for many unique categories.