0
0
ML Pythonprogramming~20 mins

Handling categorical variables in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Categorical Variable Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Why use one-hot encoding for categorical variables?

Why is one-hot encoding commonly used to handle categorical variables in machine learning?

AIt reduces the number of features by combining categories into one number.
BIt replaces missing values in categorical data with the most frequent category.
CIt assigns a unique integer to each category, preserving their natural order.
DIt converts categories into a numeric format without implying any order between them.
Attempts:
2 left
Predict Output
intermediate
2:00remaining
Output of label encoding on a categorical list

What is the output of the following Python code using sklearn's LabelEncoder?

ML Python
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
categories = ['red', 'blue', 'green', 'blue', 'red']
encoded = le.fit_transform(categories)
print(list(encoded))
A[0, 0, 1, 1, 0]
B[1, 2, 0, 2, 1]
C[2, 0, 1, 0, 2]
D[0, 1, 2, 1, 0]
Attempts:
2 left
Model Choice
advanced
2:00remaining
Best model choice for high-cardinality categorical data

You have a dataset with a categorical feature containing 10,000 unique categories. Which model is best suited to handle this feature without extensive preprocessing?

ADecision Tree without encoding the categorical feature
BGradient Boosting with target encoding of the categorical feature
CLinear Regression with one-hot encoding of the categorical feature
DK-Nearest Neighbors with label encoding of the categorical feature
Attempts:
2 left
Hyperparameter
advanced
2:00remaining
Choosing encoding method for tree-based models

Which encoding method is generally preferred for categorical variables when using tree-based models like Random Forest or XGBoost?

ALabel encoding to assign integers to categories
BBinary encoding to reduce dimensionality
CFrequency encoding to replace categories with their counts
DOne-hot encoding to create binary features for each category
Attempts:
2 left
Metrics
expert
2:00remaining
Evaluating impact of encoding on model performance

You trained two models on the same dataset: Model A uses one-hot encoding for categorical variables, Model B uses target encoding. Both models are gradient boosting classifiers. Model A has 85% accuracy, Model B has 88% accuracy on the test set. What is the most likely explanation?

ATarget encoding helped Model B capture category information better, improving accuracy.
BOne-hot encoding always causes overfitting, so Model A performs worse.
CModel B's higher accuracy is due to random chance and not encoding choice.
DOne-hot encoding is better for gradient boosting, so Model A should have higher accuracy.
Attempts:
2 left