0
0
Pandasdata~10 mins

Memory savings with categoricals in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Memory savings with categoricals
Load DataFrame with strings
Check memory usage
Convert string column to categorical
Check memory usage again
Compare memory before and after
Observe memory savings
This flow shows loading data, checking memory, converting to categorical, and then comparing memory usage to see savings.
Execution Sample
Pandas
import pandas as pd

# Create DataFrame
df = pd.DataFrame({'color': ['red', 'blue', 'green', 'blue', 'red']*1000})

# Memory before
mem_before = df.memory_usage(deep=True).sum()

# Convert to categorical
_df = df.copy()
_df['color'] = _df['color'].astype('category')

# Memory after
mem_after = _df.memory_usage(deep=True).sum()
This code creates a DataFrame with repeated color strings, checks memory, converts the column to categorical, and checks memory again.
Execution Table
StepActionMemory Usage (bytes)Description
1Create DataFrame with string column50000DataFrame with 5000 rows of repeated strings
2Calculate memory usage before conversion50000Memory usage includes full string storage
3Convert 'color' column to categoricalN/AChange data type to categorical
4Calculate memory usage after conversion8000Memory usage reduced due to category codes
5Compare memory usage42000 savedMemory reduced by storing codes instead of full strings
💡 Memory usage after conversion is much smaller, showing savings with categoricals
Variable Tracker
VariableStartAfter ConversionFinal
df['color'].dtypeobject (string)object (string)object (string)
_df['color'].dtypeN/Acategorycategory
mem_beforeN/A5000050000
mem_afterN/A80008000
Key Moments - 2 Insights
Why does memory usage drop after converting to categorical?
Because pandas stores categories as integer codes internally, which use less memory than full strings, as shown in execution_table step 4.
Does the original DataFrame change when converting to categorical?
No, the original DataFrame stays the same; a copy is made before conversion, as shown by variable_tracker where df['color'].dtype remains object.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the memory usage after converting to categorical?
A8000 bytes
B50000 bytes
C42000 bytes
D10000 bytes
💡 Hint
Check execution_table row with Step 4 for memory after conversion
According to variable_tracker, what is the dtype of the 'color' column in the original DataFrame after conversion?
Acategory
Bint
Cobject (string)
Dfloat
💡 Hint
Look at variable_tracker row for df['color'].dtype after conversion
If the 'color' column had unique strings for every row, how would memory savings change?
ASavings would be larger
BSavings would be smaller or none
CSavings would be the same
DMemory usage would increase
💡 Hint
Think about how categorical codes save memory by reusing repeated values
Concept Snapshot
Memory savings with categoricals in pandas:
- Use df.memory_usage(deep=True) to check memory
- Convert string columns with df['col'] = df['col'].astype('category')
- Categoricals store data as integer codes
- This reduces memory if many repeated values
- Original data unchanged if copied before conversion
Full Transcript
This lesson shows how converting string columns in a pandas DataFrame to categorical type saves memory. We start with a DataFrame of repeated color names stored as strings. We check memory usage before conversion, then convert the column to categorical type, which stores data as integer codes instead of full strings. After conversion, we check memory usage again and see a big drop. The original DataFrame remains unchanged if we work on a copy. This technique is useful when columns have many repeated values, reducing memory use efficiently.