0
0
Data Analysis Pythondata~5 mins

One-hot encoding in Data Analysis Python - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is one-hot encoding in data science?
One-hot encoding is a way to turn categories into numbers by creating new columns for each category. Each column shows 1 if the category is present, and 0 if not.
Click to reveal answer
beginner
Why do we use one-hot encoding before machine learning?
Machine learning models need numbers, not words. One-hot encoding changes categories into numbers so models can understand and learn from them.
Click to reveal answer
beginner
Which Python library is commonly used for one-hot encoding?
Pandas is commonly used for one-hot encoding with its function called get_dummies().
Click to reveal answer
beginner
What does the output look like after one-hot encoding a column with values ['red', 'blue', 'red']?
You get two new columns: one for 'red' and one for 'blue'. Rows with 'red' have 1 in the 'red' column and 0 in 'blue'. Rows with 'blue' have 1 in 'blue' and 0 in 'red'.
Click to reveal answer
intermediate
What is a potential downside of one-hot encoding?
It can create many new columns if there are many categories, which can make the data big and slow to work with.
Click to reveal answer
What does one-hot encoding do to a categorical column?
ACreates new columns for each category with 1 or 0 values
BReplaces categories with their length
CSorts categories alphabetically
DRemoves categories with low frequency
Which pandas function is used for one-hot encoding?
Apd.fillna()
Bpd.to_numeric()
Cpd.merge()
Dpd.get_dummies()
If a column has 4 unique categories, how many new columns will one-hot encoding create?
A1
B3
C4
D2
Why might one-hot encoding increase data size?
ABecause it adds many new columns for categories
BBecause it duplicates rows
CBecause it changes data types to strings
DBecause it removes missing values
Which type of data is one-hot encoding mainly used for?
ATime series data
BCategorical data
CNumerical data
DText data
Explain what one-hot encoding is and why it is useful in data science.
Think about how computers understand data and why categories need to be numbers.
You got /3 concepts.
    Describe a simple example of one-hot encoding with a small list of colors.
    Imagine you have colors like red, blue, and green.
    You got /3 concepts.