Challenge - 5 Problems

🎖️

Encoding Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of One-Hot Encoding with pandas

What is the output DataFrame after applying one-hot encoding to the 'Color' column using pandas get_dummies?

Data Analysis Python

import pandas as pd

df = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue']})
df_encoded = pd.get_dummies(df, columns=['Color'])
print(df_encoded)

   Color_Blue  Color_Green  Color_Red
0           0            0          1
1           1            0          0
2           0            1          0
3           1            0          0

   Blue  Green  Red
0     0      0    1
1     1      0    0
2     0      1    0
3     1      0    0

   Color_Blue  Color_Green  Color_Red
0           1            0          0
1           0            1          0
2           0            0          1
3           0            1          0

   Color_Blue  Color_Green  Color_Red
0           0            1          0
1           1            0          0
2           0            0          1
3           1            0          0

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Label Encoding Result

What is the array output after label encoding the 'Fruit' list using sklearn's LabelEncoder?

Data Analysis Python

from sklearn.preprocessing import LabelEncoder

fruits = ['apple', 'banana', 'apple', 'orange', 'banana']
encoder = LabelEncoder()
encoded = encoder.fit_transform(fruits)
print(encoded)

A[0 1 0 2 1]

B[1 2 1 3 2]

C[2 1 2 0 1]

D[0 0 1 2 1]

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Error in One-Hot Encoding with Unknown Categories

What error will this code raise when transforming new data with unseen categories using sklearn's OneHotEncoder?

Data Analysis Python

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(handle_unknown='error')
encoder.fit([['cat'], ['dog']])
encoder.transform([['cat'], ['bird']])

AKeyError: 'bird'

BValueError: Found unknown categories ['bird'] during transform

CTypeError: unhashable type: 'list'

DNo error, output is a sparse matrix

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Choosing Encoding for High Cardinality Feature

You have a categorical feature with 10,000 unique values. Which encoding method is best to reduce memory and avoid too many columns?

ALabel encoding

BFrequency encoding

CBinary encoding

DOne-hot encoding

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Effect of Label Encoding on Tree-Based Models

Why can label encoding categorical variables be problematic for linear models but usually acceptable for tree-based models?

ALinear models handle missing values better than tree models when using label encoding.

BTree models require numeric labels, linear models do not.

CLabel encoding creates dummy variables that confuse linear models but not tree models.

DLinear models assume numeric order in labels, which can mislead them; tree models split on values without assuming order.

Attempts:

2 left