Same DataFrame, 'color' changed to categorical dtype
DataFrame with 'color' column as categorical
df['color']
None
Series of strings (object)
Series of strings (object)
Categorical dtype with categories ['blue', 'green', 'red']
Categorical dtype Series
Key Moments - 3 Insights
Why does the 'color' column dtype change after assignment?
Because pd.Categorical creates a categorical object, and assigning it back replaces the original column with this new categorical type, as shown in execution_table step 4.
Are the original string values changed when converting to categorical?
No, the values remain the same but are stored more efficiently with categories, as seen in execution_table step 5 where values print the same but dtype is 'category'.
What are the categories created during conversion?
Categories are unique sorted values from the column, here ['blue', 'green', 'red'], shown in execution_table step 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the dtype of df['color'] after step 3?
Aobject
Bint
Ccategory
Dstring
💡 Hint
Step 3 creates a Categorical (dtype 'category') but does not assign it to df['color'] yet; the dtype of df['color'] remains 'object' until step 4.
At which step does the 'color' column in df change to categorical dtype?
AStep 3
BStep 4
CStep 2
DStep 5
💡 Hint
Look at the 'Action' and 'Result/Output' columns in execution_table for when assignment happens.
If we skip assigning pd.Categorical back to df['color'], what will be the dtype of df['color']?
Acategory
Bint
Cobject
Dfloat
💡 Hint
Refer to variable_tracker for df['color'] before and after assignment.
Concept Snapshot
Convert a DataFrame column to categorical with:
df['col'] = pd.Categorical(df['col'])
This changes dtype to 'category', saving memory and speeding up operations.
Categories are unique sorted values.
Assign back to keep the change.
Use categorical for repeated string data.
Full Transcript
We start with a DataFrame containing a column of strings. We select that column and convert it to a categorical type using pandas' pd.Categorical function. This creates a categorical object with unique categories. We assign this back to the DataFrame column, changing its dtype to 'category'. This conversion keeps the original values but stores them more efficiently. We can verify the change by printing the column and checking its dtype. This process helps save memory and can speed up operations on repeated string data.