Challenge - 5 Problems

🎖️

Categorical Variable Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why use one-hot encoding for categorical variables?

Why is one-hot encoding commonly used to handle categorical variables in machine learning?

AIt reduces the number of features by combining categories into one number.

BIt replaces missing values in categorical data with the most frequent category.

CIt assigns a unique integer to each category, preserving their natural order.

DIt converts categories into a numeric format without implying any order between them.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of label encoding on a categorical list

What is the output of the following Python code using sklearn's LabelEncoder?

ML Python

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
categories = ['red', 'blue', 'green', 'blue', 'red']
encoded = le.fit_transform(categories)
print(list(encoded))

A[0, 0, 1, 1, 0]

B[1, 2, 0, 2, 1]

C[2, 0, 1, 0, 2]

D[0, 1, 2, 1, 0]

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Best model choice for high-cardinality categorical data

You have a dataset with a categorical feature containing 10,000 unique categories. Which model is best suited to handle this feature without extensive preprocessing?

ADecision Tree without encoding the categorical feature

BGradient Boosting with target encoding of the categorical feature

CLinear Regression with one-hot encoding of the categorical feature

DK-Nearest Neighbors with label encoding of the categorical feature

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Choosing encoding method for tree-based models

Which encoding method is generally preferred for categorical variables when using tree-based models like Random Forest or XGBoost?

ALabel encoding to assign integers to categories

BBinary encoding to reduce dimensionality

CFrequency encoding to replace categories with their counts

DOne-hot encoding to create binary features for each category

Attempts:

2 left

❓ Metrics

expert

2:00remaining

Evaluating impact of encoding on model performance

You trained two models on the same dataset: Model A uses one-hot encoding for categorical variables, Model B uses target encoding. Both models are gradient boosting classifiers. Model A has 85% accuracy, Model B has 88% accuracy on the test set. What is the most likely explanation?

ATarget encoding helped Model B capture category information better, improving accuracy.

BOne-hot encoding always causes overfitting, so Model A performs worse.

CModel B's higher accuracy is due to random chance and not encoding choice.

DOne-hot encoding is better for gradient boosting, so Model A should have higher accuracy.

Attempts:

2 left