Pandasdata~10 mins

Converting to categorical in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Converting to categorical

Start with DataFrame

↓

Select column to convert

↓

Apply pd.Categorical()

↓

Column type changes to 'category'

↓

Use categorical benefits: less memory, faster ops

↓

End

Start with a DataFrame, pick a column, convert it to categorical type, then use the benefits of categories.

Execution Sample

Pandas

import pandas as pd

df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']})
df['color'] = pd.Categorical(df['color'])
print(df['color'])

This code converts the 'color' column in the DataFrame to categorical type.

Execution Table

Step	Action	Input/Variable	Result/Output
1	Create DataFrame	{'color': ['red', 'blue', 'green', 'blue', 'red']}	DataFrame with 'color' column as object type
2	Select 'color' column	df['color']	Series with values ['red', 'blue', 'green', 'blue', 'red'] and dtype object
3	Convert to categorical	pd.Categorical(df['color'])	Categorical dtype with categories ['blue', 'green', 'red']
4	Assign back to df['color']	df['color'] = pd.Categorical(df['color'])	'color' column dtype changes to 'category'
5	Print df['color']	df['color']	Shows categorical data with categories and values
6	Check dtype	df['color'].dtype	category

💡 Conversion complete, 'color' column is now categorical type.

Variable Tracker

Variable	Start	After Step 1	After Step 3	After Step 4	Final
df	None	{'color': ['red', 'blue', 'green', 'blue', 'red']}	Same DataFrame, 'color' still object dtype	Same DataFrame, 'color' changed to categorical dtype	DataFrame with 'color' column as categorical
df['color']	None	Series of strings (object)	Series of strings (object)	Categorical dtype with categories ['blue', 'green', 'red']	Categorical dtype Series

Key Moments - 3 Insights

Why does the 'color' column dtype change after assignment?

Are the original string values changed when converting to categorical?

What are the categories created during conversion?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the dtype of df['color'] after step 3?

Aobject

Bint

Ccategory

Dstring

Concept Snapshot

Convert a DataFrame column to categorical with:
  df['col'] = pd.Categorical(df['col'])
This changes dtype to 'category', saving memory and speeding up operations.
Categories are unique sorted values.
Assign back to keep the change.
Use categorical for repeated string data.

Full Transcript

We start with a DataFrame containing a column of strings. We select that column and convert it to a categorical type using pandas' pd.Categorical function. This creates a categorical object with unique categories. We assign this back to the DataFrame column, changing its dtype to 'category'. This conversion keeps the original values but stores them more efficiently. We can verify the change by printing the column and checking its dtype. This process helps save memory and can speed up operations on repeated string data.