0
0
Pandasdata~5 mins

Why categorical type matters in Pandas

Choose your learning style9 modes available
Introduction

Using categorical type helps pandas store and work with repeated text data more efficiently. It saves memory and speeds up some operations.

You have a column with a few repeated text values like colors or categories.
You want to reduce memory use when working with large datasets.
You want faster sorting or grouping by text columns.
You want to control the order of categories for analysis or plotting.
Syntax
Pandas
df['column'] = df['column'].astype('category')
This changes the column to categorical type in pandas.
You can also specify the order of categories if needed.
Examples
Convert the 'color' column to categorical type.
Pandas
df['color'] = df['color'].astype('category')
Create an ordered categorical column for sizes.
Pandas
df['size'] = pd.Categorical(df['size'], categories=['small', 'medium', 'large'], ordered=True)
Sample Program

This code shows how converting a text column to categorical reduces memory use and lists the categories.

Pandas
import pandas as pd

# Create a sample DataFrame
data = {'color': ['red', 'blue', 'red', 'green', 'blue', 'blue']}
df = pd.DataFrame(data)

# Show memory usage before
mem_before = df.memory_usage(deep=True).sum()

# Convert 'color' to categorical
df['color'] = df['color'].astype('category')

# Show memory usage after
mem_after = df.memory_usage(deep=True).sum()

print(f"Memory before: {mem_before} bytes")
print(f"Memory after: {mem_after} bytes")

# Show categories
print(f"Categories: {df['color'].cat.categories.tolist()}")
OutputSuccess
Important Notes

Categorical columns use less memory because pandas stores the text once and uses codes internally.

Operations like grouping or sorting can be faster with categorical data.

Be careful when adding new categories; you may need to update the category list.

Summary

Categorical type saves memory for repeated text data.

It can speed up some data operations.

You can set an order for categories if needed.