Recall & Review
beginner
What is the purpose of encoding categorical variables in data science?
Encoding categorical variables means changing categories into numbers so computers can understand and work with them in models.
Click to reveal answer
beginner
What is one-hot encoding?
One-hot encoding creates new columns for each category and marks a 1 if the row has that category, otherwise 0. It avoids giving categories a wrong order.
Click to reveal answer
intermediate
When should you use label encoding instead of one-hot encoding?
Use label encoding when the categories have a natural order, like 'small', 'medium', 'large'. It assigns numbers to categories in order.
Click to reveal answer
intermediate
What problem can arise if you use label encoding on categories without order?
The model might think one category is bigger or smaller than another because of the numbers, which can cause wrong results.
Click to reveal answer
beginner
Name a Python library commonly used for encoding categorical variables.
Pandas and scikit-learn are popular. Pandas has get_dummies() for one-hot encoding, and scikit-learn has LabelEncoder and OneHotEncoder.
Click to reveal answer
What does one-hot encoding do?
✗ Incorrect
One-hot encoding creates new columns for each category and marks 1 if present, 0 otherwise.
Which encoding method is best for ordered categories?
✗ Incorrect
Label encoding assigns numbers in order, which fits ordered categories.
What risk comes from using label encoding on unordered categories?
✗ Incorrect
Label encoding can mislead the model to think categories have order when they don't.
Which Python function creates one-hot encoded columns?
✗ Incorrect
pandas.get_dummies() converts categorical columns into one-hot encoded columns.
Why do we encode categorical variables before modeling?
✗ Incorrect
Models need numbers, so encoding converts categories into numbers.
Explain the difference between one-hot encoding and label encoding.
Think about how each method represents categories as numbers.
You got /4 concepts.
Describe why encoding categorical variables is important in data science.
Consider what computers understand and how data must be prepared.
You got /4 concepts.