Data Analysis Pythondata~5 mins

Memory-efficient operations in Data Analysis Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Memory-efficient operations help you work with large data without using too much computer memory. This keeps your computer fast and avoids crashes.

When working with very large datasets that don't fit in your computer's memory.

When running data analysis on a laptop or device with limited RAM.

When you want to speed up your program by reducing memory use.

When processing data streams or files too big to load all at once.

When you want to save memory to run multiple programs at the same time.

Syntax

Data Analysis Python

import pandas as pd

# Use chunksize to read large CSV in parts
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
    process(chunk)

# Use dtype to reduce memory of columns
df = pd.read_csv('file.csv', dtype={'column1': 'int32', 'column2': 'float32'})

Reading data in chunks lets you handle big files piece by piece.

Setting column data types smaller than default saves memory.

Examples

This reads the CSV file in parts of 5000 rows at a time, so it never loads the whole file at once.

Data Analysis Python

import pandas as pd

# Read CSV in small parts
for chunk in pd.read_csv('data.csv', chunksize=5000):
    print(chunk.shape)

Here, we tell pandas to use smaller data types for columns to save memory.

Data Analysis Python

import pandas as pd

dtypes = {'age': 'int8', 'salary': 'float32'}
df = pd.read_csv('employees.csv', dtype=dtypes)
print(df.memory_usage(deep=True))

Changing text columns to 'category' type saves memory by storing repeated values efficiently.

Data Analysis Python

import numpy as np

# Convert a column to category type
df['city'] = df['city'].astype('category')
print(df.memory_usage(deep=True))

Sample Program

This program creates a big table with 1 million rows. It shows memory used before and after changing data types to save memory.

Data Analysis Python

import pandas as pd
import numpy as np

# Create a large DataFrame with default types
size = 1000000
names = ['Alice', 'Bob', 'Charlie', 'David']

# Default object type for names uses more memory
df = pd.DataFrame({
    'id': np.arange(size),
    'age': np.random.randint(18, 70, size),
    'name': np.random.choice(names, size)
})

print('Memory usage before optimization:')
print(df.memory_usage(deep=True))

# Convert 'name' column to category to save memory
df['name'] = df['name'].astype('category')

# Change 'age' to smaller integer type
df['age'] = df['age'].astype('int8')

print('\nMemory usage after optimization:')
print(df.memory_usage(deep=True))

OutputSuccess

Important Notes

Using smaller data types can cause errors if values don't fit, so choose carefully.

Converting text columns to 'category' is very effective when many repeated values exist.

Reading data in chunks is useful when you cannot load all data at once.

Summary

Memory-efficient operations help handle big data on limited memory.

Use chunked reading and smaller data types to save memory.

Converting text columns to 'category' type reduces memory for repeated values.