Pandasdata~10 mins

Memory savings with categoricals in Pandas - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Memory savings with categoricals

Load DataFrame with strings

↓

Check memory usage

↓

Convert string column to categorical

↓

Check memory usage again

↓

Compare memory before and after

↓

Observe memory savings

This flow shows loading data, checking memory, converting to categorical, and then comparing memory usage to see savings.

Execution Sample

Pandas

import pandas as pd

# Create DataFrame
df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']*1000})

# Memory before
mem_before = df.memory_usage(deep=True).sum()

# Convert to categorical
_df = df.copy()
_df['color'] = _df['color'].astype('category')

# Memory after
mem_after = _df.memory_usage(deep=True).sum()

This code creates a DataFrame with repeated color strings, checks memory, converts the column to categorical, and checks memory again.

Execution Table

Step	Action	Memory Usage (bytes)	Description
1	Create DataFrame with string column	50000	DataFrame with 5000 rows of repeated strings
2	Calculate memory usage before conversion	50000	Memory usage includes full string storage
3	Convert 'color' column to categorical	N/A	Change data type to categorical
4	Calculate memory usage after conversion	8000	Memory usage reduced due to category codes
5	Compare memory usage	42000 saved	Memory reduced by storing codes instead of full strings

💡 Memory usage after conversion is much smaller, showing savings with categoricals

Variable Tracker

Variable	Start	After Conversion	Final
df['color'].dtype	object (string)	object (string)	object (string)
_df['color'].dtype	N/A	category	category
mem_before	N/A	50000	50000
mem_after	N/A	8000	8000

Key Moments - 2 Insights

Why does memory usage drop after converting to categorical?

Does the original DataFrame change when converting to categorical?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the memory usage after converting to categorical?

A8000 bytes

B50000 bytes

C42000 bytes

D10000 bytes

Concept Snapshot

Memory savings with categoricals in pandas:
- Use df.memory_usage(deep=True) to check memory
- Convert string columns with df['col'] = df['col'].astype('category')
- Categoricals store data as integer codes
- This reduces memory if many repeated values
- Original data unchanged if copied before conversion

Full Transcript

This lesson shows how converting string columns in a pandas DataFrame to categorical type saves memory. We start with a DataFrame of repeated color names stored as strings. We check memory usage before conversion, then convert the column to categorical type, which stores data as integer codes instead of full strings. After conversion, we check memory usage again and see a big drop. The original DataFrame remains unchanged if we work on a copy. This technique is useful when columns have many repeated values, reducing memory use efficiently.