0
0
Pandasdata~10 mins

Using appropriate dtypes in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Using appropriate dtypes
Load data with default dtypes
Check memory usage
Identify columns to optimize
Convert columns to appropriate dtypes
Check memory usage again
Use optimized data for analysis
This flow shows loading data, checking memory, converting columns to better types, and then using optimized data.
Execution Sample
Pandas
import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['x', 'y', 'z'],
    'C': [1.0, 2.5, 3.1]
})
df['A'] = df['A'].astype('int8')
df['B'] = df['B'].astype('category')
This code creates a DataFrame and changes column types to use less memory.
Execution Table
StepActionColumnOriginal dtypeNew dtypeMemory Usage (approx)
1Create DataFrameAint64int6424 bytes
2Create DataFrameBobjectobject72 bytes
3Create DataFrameCfloat64float6424 bytes
4Convert dtypeAint64int83 bytes
5Convert dtypeBobjectcategory33 bytes
6No changeCfloat64float6424 bytes
7Total memory before optimization---120 bytes
8Total memory after optimization---60 bytes
💡 Memory usage reduced by converting columns to smaller or categorical dtypes.
Variable Tracker
VariableStartAfter Step 4After Step 5Final
df['A'].dtypeint64int8int8int8
df['B'].dtypeobjectobjectcategorycategory
df['C'].dtypefloat64float64float64float64
Memory usage (approx)120 bytes99 bytes60 bytes60 bytes
Key Moments - 3 Insights
Why do we convert 'B' from object to category dtype?
Because 'B' contains repeated strings, converting to category saves memory by storing codes instead of full strings, as shown in steps 2 and 5 in the execution_table.
Why does converting 'A' from int64 to int8 reduce memory?
int8 uses 1 byte per value instead of 8 bytes for int64, so memory drops significantly as seen between steps 1 and 4 in the execution_table.
Why didn't we change the dtype of column 'C'?
Because 'C' contains float values that need precision, and no smaller float dtype was used here, so it remains float64 as in steps 3 and 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4, what is the new dtype of column 'A'?
Aint8
Bint64
Cfloat64
Dcategory
💡 Hint
Check the 'New dtype' column for step 4 in the execution_table.
At which step does the memory usage reduce to about half of the original?
AStep 5
BStep 8
CStep 6
DStep 3
💡 Hint
Look at the 'Memory Usage (approx)' column and compare total memory before and after optimization.
If column 'B' was not converted to category, what would be the approximate memory usage after step 5?
AAbout 33 bytes
BAbout 24 bytes
CAbout 72 bytes
DAbout 3 bytes
💡 Hint
Refer to the original memory usage of column 'B' at step 2 in the execution_table.
Concept Snapshot
Using appropriate dtypes in pandas:
- Load data with default types
- Check memory usage
- Convert columns to smaller types (e.g., int64 to int8)
- Convert string columns to 'category' if repeated
- Check memory again to confirm savings
- Use optimized DataFrame for faster, lighter analysis
Full Transcript
This visual execution shows how to use appropriate data types in pandas to save memory. We start by creating a DataFrame with default types: integers as int64, strings as object, and floats as float64. We check memory usage, then convert the integer column to int8 to use less space. We convert the string column to category because it has repeated values, which saves memory by storing codes instead of full strings. The float column remains unchanged. After conversions, memory usage is about half the original. This process helps make data analysis faster and more efficient by reducing memory load.