0
0
Pandasdata~10 mins

Converting to categorical in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Converting to categorical
Start with DataFrame
Select column to convert
Apply pd.Categorical()
Column type changes to 'category'
Use categorical benefits: less memory, faster ops
End
Start with a DataFrame, pick a column, convert it to categorical type, then use the benefits of categories.
Execution Sample
Pandas
import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']})
df['color'] = pd.Categorical(df['color'])
print(df['color'])
This code converts the 'color' column in the DataFrame to categorical type.
Execution Table
StepActionInput/VariableResult/Output
1Create DataFrame{'color': ['red', 'blue', 'green', 'blue', 'red']}DataFrame with 'color' column as object type
2Select 'color' columndf['color']Series with values ['red', 'blue', 'green', 'blue', 'red'] and dtype object
3Convert to categoricalpd.Categorical(df['color'])Categorical dtype with categories ['blue', 'green', 'red']
4Assign back to df['color']df['color'] = pd.Categorical(df['color'])'color' column dtype changes to 'category'
5Print df['color']df['color']Shows categorical data with categories and values
6Check dtypedf['color'].dtypecategory
💡 Conversion complete, 'color' column is now categorical type.
Variable Tracker
VariableStartAfter Step 1After Step 3After Step 4Final
dfNone{'color': ['red', 'blue', 'green', 'blue', 'red']}Same DataFrame, 'color' still object dtypeSame DataFrame, 'color' changed to categorical dtypeDataFrame with 'color' column as categorical
df['color']NoneSeries of strings (object)Series of strings (object)Categorical dtype with categories ['blue', 'green', 'red']Categorical dtype Series
Key Moments - 3 Insights
Why does the 'color' column dtype change after assignment?
Because pd.Categorical creates a categorical object, and assigning it back replaces the original column with this new categorical type, as shown in execution_table step 4.
Are the original string values changed when converting to categorical?
No, the values remain the same but are stored more efficiently with categories, as seen in execution_table step 5 where values print the same but dtype is 'category'.
What are the categories created during conversion?
Categories are unique sorted values from the column, here ['blue', 'green', 'red'], shown in execution_table step 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the dtype of df['color'] after step 3?
Aobject
Bint
Ccategory
Dstring
💡 Hint
Step 3 creates a Categorical (dtype 'category') but does not assign it to df['color'] yet; the dtype of df['color'] remains 'object' until step 4.
At which step does the 'color' column in df change to categorical dtype?
AStep 3
BStep 4
CStep 2
DStep 5
💡 Hint
Look at the 'Action' and 'Result/Output' columns in execution_table for when assignment happens.
If we skip assigning pd.Categorical back to df['color'], what will be the dtype of df['color']?
Acategory
Bint
Cobject
Dfloat
💡 Hint
Refer to variable_tracker for df['color'] before and after assignment.
Concept Snapshot
Convert a DataFrame column to categorical with:
  df['col'] = pd.Categorical(df['col'])
This changes dtype to 'category', saving memory and speeding up operations.
Categories are unique sorted values.
Assign back to keep the change.
Use categorical for repeated string data.
Full Transcript
We start with a DataFrame containing a column of strings. We select that column and convert it to a categorical type using pandas' pd.Categorical function. This creates a categorical object with unique categories. We assign this back to the DataFrame column, changing its dtype to 'category'. This conversion keeps the original values but stores them more efficiently. We can verify the change by printing the column and checking its dtype. This process helps save memory and can speed up operations on repeated string data.