Recall & Review

beginner

What is one-hot encoding in machine learning?

One-hot encoding is a way to turn categories into numbers by making a new column for each category. Each row has a 1 in the column of its category and 0 in others.

Click to reveal answer

beginner

Why do we use one-hot encoding instead of just numbers for categories?

Using numbers alone can confuse the model into thinking some categories are bigger or better. One-hot encoding treats all categories equally without order.

Click to reveal answer

beginner

How does one-hot encoding handle a category called 'Red' in a color feature with options Red, Blue, Green?

It creates three columns: Red, Blue, Green. For 'Red', the encoded row is [1, 0, 0].

Click to reveal answer

intermediate

What is a potential downside of one-hot encoding when there are many categories?

It can create many new columns, making data big and slow to work with. This is called the 'curse of dimensionality'.

Click to reveal answer

beginner

Can one-hot encoding be used for numerical features?

No, one-hot encoding is for categorical features only. Numerical features are used as they are or scaled differently.

Click to reveal answer

What does one-hot encoding do to a categorical feature?

ASorts categories alphabetically

BCreates a new column for each category with 1 or 0 values

CReplaces categories with random numbers

DRemoves categories from the data

Why is one-hot encoding preferred over assigning numbers like 1, 2, 3 to categories?

ABecause numbers can imply order or size which may mislead the model

BBecause numbers take more memory

CBecause numbers are harder to compute

DBecause numbers are not allowed in machine learning

If a feature has 5 categories, how many columns will one-hot encoding create?

C10

What problem can arise if a categorical feature has hundreds of categories and you use one-hot encoding?

AData becomes too small

BModel runs faster

CData becomes very large and sparse, slowing down the model

DCategories get merged automatically

Which type of data is one-hot encoding used for?

ANumerical continuous data

BImage data

CText data without categories

DCategorical data

Explain in your own words what one-hot encoding is and why it is useful in machine learning.

Describe a situation where one-hot encoding might cause problems and how you might handle it.

Practice

(1/5)

1. What does one-hot encoding do in machine learning?

easy

A. It converts categorical labels into binary columns with 1s and 0s.

B. It normalizes numerical data to a 0-1 range.

C. It reduces the number of features by combining categories.

D. It fills missing values with the most frequent category.

One-hot encoding in ML Python - Cheat Sheet & Quick Revision

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of one-hot encoding

Step 2: Compare options with this definition

Final Answer:

Quick Check:

Solution

Step 1: Recall pandas function for one-hot encoding

Step 2: Match the correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand pd.get_dummies on a Series

Step 2: Predict the output for given colors

Final Answer:

Quick Check:

Solution

Step 1: Identify input shape requirement for OneHotEncoder

Step 2: Fix input shape

Final Answer:

Quick Check:

Solution

Step 1: Understand the need to handle unseen categories

Step 2: Choose method that fits training data and ignores unknowns

Step 3: Avoid pd.get_dummies on combined data to prevent data leakage

Final Answer:

Quick Check: