0
0
Data Analysis Pythondata~5 mins

Data type optimization in Data Analysis Python

Choose your learning style9 modes available
Introduction

Data type optimization helps make your data smaller and faster to work with. It saves memory and speeds up analysis.

When your dataset is very large and uses a lot of memory.
When you want your data analysis to run faster on your computer.
When you plan to share data and want to reduce file size.
When you notice your program is slow or crashes due to memory limits.
When preparing data for machine learning models that need efficient input.
Syntax
Data Analysis Python
df['column'] = df['column'].astype(new_type)
Use pandas DataFrame's astype() method to change a column's data type.
Common new_type values: 'int8', 'int16', 'float32', 'category', 'bool'.
Examples
Change the 'age' column to use 8-bit integers to save memory if ages fit in this range.
Data Analysis Python
df['age'] = df['age'].astype('int8')
Convert 'gender' column to category type to save space when it has few unique values.
Data Analysis Python
df['gender'] = df['gender'].astype('category')
Use 32-bit floats instead of 64-bit to reduce memory for decimal numbers.
Data Analysis Python
df['income'] = df['income'].astype('float32')
Sample Program

This code creates a small table with age, gender, and income. It shows the original data types and memory used. Then it changes the data types to smaller ones and shows the new memory usage. You will see memory goes down.

Data Analysis Python
import pandas as pd

# Create a sample DataFrame
data = {'age': [25, 32, 40, 28],
        'gender': ['M', 'F', 'F', 'M'],
        'income': [50000.0, 60000.0, 65000.0, 58000.0]}

df = pd.DataFrame(data)

# Check original data types and memory usage
print('Original dtypes:')
print(df.dtypes)
print(f"Memory usage: {df.memory_usage(deep=True).sum()} bytes")

# Optimize data types
df['age'] = df['age'].astype('int8')
df['gender'] = df['gender'].astype('category')
df['income'] = df['income'].astype('float32')

# Check new data types and memory usage
print('\nOptimized dtypes:')
print(df.dtypes)
print(f"Memory usage: {df.memory_usage(deep=True).sum()} bytes")
OutputSuccess
Important Notes

Changing data types can reduce memory but be careful not to lose important information.

Use 'category' for columns with few unique text values to save space.

Always check your data after conversion to avoid errors.

Summary

Data type optimization saves memory and speeds up data work.

Use pandas astype() to change column types to smaller or categorical types.

Check memory before and after to see the improvement.