0
0
Data Analysis Pythondata~5 mins

Categorical data type optimization in Data Analysis Python - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is a categorical data type in data analysis?
A categorical data type is a way to store data that has a limited set of possible values, like colors or categories, which helps save memory and speeds up analysis.
Click to reveal answer
beginner
Why should you convert string columns with repeated values to categorical type?
Because categorical type stores each unique value only once and uses codes for the data, reducing memory use and making operations faster.
Click to reveal answer
intermediate
How does pandas represent categorical data internally?
Pandas stores categorical data as integer codes that point to a list of unique categories, which saves space compared to storing full strings repeatedly.
Click to reveal answer
beginner
What is a common method to convert a column to categorical in pandas?
Use the pandas function: df['column'] = df['column'].astype('category') to convert the column to categorical type.
Click to reveal answer
intermediate
What is a potential downside of using categorical data type?
If the column has many unique values (high cardinality), converting to categorical might not save memory and can add overhead.
Click to reveal answer
What does converting a column to categorical type mainly help with?
AIncreasing the number of unique values
BReducing memory usage
CChanging data to numbers only
DRemoving missing values
Which pandas method converts a column to categorical?
Adf['col'].astype('category')
Bdf['col'].to_numeric()
Cdf['col'].fillna()
Ddf['col'].unique()
What is stored internally for categorical data in pandas?
ABoolean values
BFull strings repeated
CInteger codes and categories list
DOnly numbers
When might categorical type NOT save memory?
AWhen there are many unique values
BWhen data is numeric
CWhen data has missing values
DWhen data is already numeric
Which of these is a benefit of categorical data type?
AConverting numbers to strings
BMore precise floating point calculations
CAutomatic data cleaning
DFaster comparisons and grouping
Explain what categorical data type is and why it is useful in data analysis.
Think about how repeated strings can be stored more efficiently.
You got /4 concepts.
    Describe how to convert a string column to categorical in pandas and when you should do it.
    Focus on the pandas method and the type of data that benefits most.
    You got /4 concepts.