0
0
Pandasdata~5 mins

Memory savings with categoricals in Pandas

Choose your learning style9 modes available
Introduction

Using categorical data types helps save memory when working with repeated text values in data. It makes your data smaller and faster to use.

You have a column with many repeated text values, like colors or categories.
You want to reduce the memory your data uses to work faster.
You plan to do analysis on columns with limited unique values.
You want to prepare data for machine learning and reduce size.
You want to improve performance when filtering or grouping data.
Syntax
Pandas
df['column'] = df['column'].astype('category')
This changes the column type to 'category', which stores unique values once.
Categorical columns use less memory but behave differently than normal text columns.
Examples
This converts the 'color' column to categorical, saving memory by storing unique colors once.
Pandas
import pandas as pd

data = ['red', 'blue', 'red', 'green', 'blue']
df = pd.DataFrame({'color': data})
df['color'] = df['color'].astype('category')
This creates a new column with numeric codes for each category, useful for analysis.
Pandas
df['color_codes'] = df['color'].cat.codes
Sample Program

This code shows how converting a text column to categorical reduces memory use and lists the unique categories.

Pandas
import pandas as pd

# Create a DataFrame with repeated text values
colors = ['red', 'blue', 'red', 'green', 'blue', 'green', 'red', 'blue']
df = pd.DataFrame({'color': colors})

# Check memory usage before conversion
mem_before = df.memory_usage(deep=True)['color']

# Convert to categorical type
df['color'] = df['color'].astype('category')

# Check memory usage after conversion
mem_after = df.memory_usage(deep=True)['color']

print(f"Memory before: {mem_before} bytes")
print(f"Memory after: {mem_after} bytes")

# Show the DataFrame and categories
print(df)
print(f"Categories: {df['color'].cat.categories.tolist()}")
OutputSuccess
Important Notes

Categorical columns are best for columns with repeated values and few unique items.

Operations like sorting or grouping can be faster on categorical columns.

Be careful when mixing categorical and non-categorical data to avoid errors.

Summary

Categoricals save memory by storing repeated text values once.

Use astype('category') to convert columns.

Memory savings help with faster and efficient data analysis.