Data Analysis Pythondata~10 mins

Categorical data type optimization in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Categorical data type optimization

Load DataFrame with object columns

↓

Check unique values count

↓

Convert suitable columns to 'category'

↓

Compare memory usage before and after

↓

Use optimized DataFrame for analysis

This flow shows how to convert object columns with repeated values into categorical type to save memory and speed up analysis.

Execution Sample

Data Analysis Python

import pandas as pd

# Create sample data
s = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'banana'])

# Convert to category
d = s.astype('category')

# Show memory usage
print(s.memory_usage(), d.memory_usage())

This code creates a series of fruit names, converts it to a categorical type, and compares memory usage.

Execution Table

Step	Action	Input/Variable	Result/Output
1	Create Series s	['apple', 'banana', 'apple', 'orange', 'banana', 'banana']	s: object dtype, 6 elements
2	Check s.memory_usage()	s	Memory usage: 176 bytes
3	Convert s to category	s.astype('category')	d: categorical dtype, 6 elements
4	Check d.memory_usage()	d	Memory usage: 136 bytes
5	Compare memory usage	s.memory_usage() vs d.memory_usage()	Categorical uses less memory
6	Use d for analysis	d	Optimized data for faster operations

💡 Conversion done, memory usage reduced, ready for optimized analysis

Variable Tracker

Variable	Start	After Conversion	Final
s	Series of strings (object)	Same (unchanged)	Series of strings (object)
d	Not defined	Series of categorical type	Series of categorical type

Key Moments - 3 Insights

Why does converting to 'category' reduce memory usage?

Can all object columns be converted to categorical?

Does converting to categorical change the data values?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the memory usage of the original Series s at step 2?

A136 bytes

B176 bytes

C6 bytes

D200 bytes

Concept Snapshot

Categorical data type optimization:
- Use pd.Series.astype('category') to convert object columns
- Saves memory by storing unique values once
- Speeds up operations on repeated values
- Best for columns with few unique entries
- Check memory usage before and after conversion

Full Transcript

We start with a Series of strings representing fruits. Initially, it uses more memory because each string is stored fully. By converting this Series to a categorical type, pandas stores each unique fruit once and uses integer codes for the data. This reduces memory usage, as shown by comparing memory usage before and after conversion. Not all columns benefit; only those with repeated values and limited unique entries. The data values remain the same visually, only the internal storage changes. This optimization helps speed up data analysis and reduces memory load.