Recall & Review
beginner
What is one-hot encoding in machine learning?
One-hot encoding is a way to turn categories into numbers by making a new column for each category. Each row has a 1 in the column of its category and 0 in others.
Click to reveal answer
beginner
Why do we use one-hot encoding instead of just numbers for categories?
Using numbers alone can confuse the model into thinking some categories are bigger or better. One-hot encoding treats all categories equally without order.
Click to reveal answer
beginner
How does one-hot encoding handle a category called 'Red' in a color feature with options Red, Blue, Green?
It creates three columns: Red, Blue, Green. For 'Red', the encoded row is [1, 0, 0].
Click to reveal answer
intermediate
What is a potential downside of one-hot encoding when there are many categories?
It can create many new columns, making data big and slow to work with. This is called the 'curse of dimensionality'.
Click to reveal answer
beginner
Can one-hot encoding be used for numerical features?
No, one-hot encoding is for categorical features only. Numerical features are used as they are or scaled differently.
Click to reveal answer
What does one-hot encoding do to a categorical feature?
✗ Incorrect
One-hot encoding creates a new column for each category and marks 1 where the category is present, 0 otherwise.
Why is one-hot encoding preferred over assigning numbers like 1, 2, 3 to categories?
✗ Incorrect
Assigning numbers can make the model think some categories are bigger or better, which is not true for categories.
If a feature has 5 categories, how many columns will one-hot encoding create?
✗ Incorrect
One-hot encoding creates one column per category, so 5 categories mean 5 columns.
What problem can arise if a categorical feature has hundreds of categories and you use one-hot encoding?
✗ Incorrect
Many categories create many columns, making data large and sparse, which can slow down training.
Which type of data is one-hot encoding used for?
✗ Incorrect
One-hot encoding is specifically for categorical data to convert categories into numbers.
Explain in your own words what one-hot encoding is and why it is useful in machine learning.
Think about how categories are turned into numbers without implying order.
You got /3 concepts.
Describe a situation where one-hot encoding might cause problems and how you might handle it.
Consider what happens if you have hundreds of categories.
You got /3 concepts.