0
0
ML Pythonml~5 mins

One-hot encoding in ML Python - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is one-hot encoding in machine learning?
One-hot encoding is a way to turn categories into numbers by making a new column for each category. Each row has a 1 in the column of its category and 0 in others.
Click to reveal answer
beginner
Why do we use one-hot encoding instead of just numbers for categories?
Using numbers alone can confuse the model into thinking some categories are bigger or better. One-hot encoding treats all categories equally without order.
Click to reveal answer
beginner
How does one-hot encoding handle a category called 'Red' in a color feature with options Red, Blue, Green?
It creates three columns: Red, Blue, Green. For 'Red', the encoded row is [1, 0, 0].
Click to reveal answer
intermediate
What is a potential downside of one-hot encoding when there are many categories?
It can create many new columns, making data big and slow to work with. This is called the 'curse of dimensionality'.
Click to reveal answer
beginner
Can one-hot encoding be used for numerical features?
No, one-hot encoding is for categorical features only. Numerical features are used as they are or scaled differently.
Click to reveal answer
What does one-hot encoding do to a categorical feature?
ASorts categories alphabetically
BCreates a new column for each category with 1 or 0 values
CReplaces categories with random numbers
DRemoves categories from the data
Why is one-hot encoding preferred over assigning numbers like 1, 2, 3 to categories?
ABecause numbers can imply order or size which may mislead the model
BBecause numbers take more memory
CBecause numbers are harder to compute
DBecause numbers are not allowed in machine learning
If a feature has 5 categories, how many columns will one-hot encoding create?
A5
B1
C10
D0
What problem can arise if a categorical feature has hundreds of categories and you use one-hot encoding?
AData becomes too small
BModel runs faster
CData becomes very large and sparse, slowing down the model
DCategories get merged automatically
Which type of data is one-hot encoding used for?
ANumerical continuous data
BImage data
CText data without categories
DCategorical data
Explain in your own words what one-hot encoding is and why it is useful in machine learning.
Think about how categories are turned into numbers without implying order.
You got /3 concepts.
    Describe a situation where one-hot encoding might cause problems and how you might handle it.
    Consider what happens if you have hundreds of categories.
    You got /3 concepts.