0
0
Data Analysis Pythondata~5 mins

Encoding categorical variables in Data Analysis Python - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the purpose of encoding categorical variables in data science?
Encoding categorical variables means changing categories into numbers so computers can understand and work with them in models.
Click to reveal answer
beginner
What is one-hot encoding?
One-hot encoding creates new columns for each category and marks a 1 if the row has that category, otherwise 0. It avoids giving categories a wrong order.
Click to reveal answer
intermediate
When should you use label encoding instead of one-hot encoding?
Use label encoding when the categories have a natural order, like 'small', 'medium', 'large'. It assigns numbers to categories in order.
Click to reveal answer
intermediate
What problem can arise if you use label encoding on categories without order?
The model might think one category is bigger or smaller than another because of the numbers, which can cause wrong results.
Click to reveal answer
beginner
Name a Python library commonly used for encoding categorical variables.
Pandas and scikit-learn are popular. Pandas has get_dummies() for one-hot encoding, and scikit-learn has LabelEncoder and OneHotEncoder.
Click to reveal answer
What does one-hot encoding do?
ACombines categories into one column
BAssigns a unique number to each category
CRemoves categorical variables from data
DCreates new columns for each category with 1 or 0 values
Which encoding method is best for ordered categories?
ALabel encoding
BOne-hot encoding
CRandom encoding
DFrequency encoding
What risk comes from using label encoding on unordered categories?
AData size increases
BModel treats categories as ordered numbers
CCategories get removed
DEncoding fails to run
Which Python function creates one-hot encoded columns?
Apandas.get_dummies()
Bnumpy.array()
Csklearn.LabelEncoder()
Dmatplotlib.plot()
Why do we encode categorical variables before modeling?
ATo remove missing values
BTo reduce data size
CModels only understand numbers
DTo sort data alphabetically
Explain the difference between one-hot encoding and label encoding.
Think about how each method represents categories as numbers.
You got /4 concepts.
    Describe why encoding categorical variables is important in data science.
    Consider what computers understand and how data must be prepared.
    You got /4 concepts.