Memory-efficient operations help you work with large data without using too much computer memory. This keeps your computer fast and avoids crashes.
0
0
Memory-efficient operations in Data Analysis Python
Introduction
When working with very large datasets that don't fit in your computer's memory.
When running data analysis on a laptop or device with limited RAM.
When you want to speed up your program by reducing memory use.
When processing data streams or files too big to load all at once.
When you want to save memory to run multiple programs at the same time.
Syntax
Data Analysis Python
import pandas as pd # Use chunksize to read large CSV in parts for chunk in pd.read_csv('large_file.csv', chunksize=10000): process(chunk) # Use dtype to reduce memory of columns df = pd.read_csv('file.csv', dtype={'column1': 'int32', 'column2': 'float32'})
Reading data in chunks lets you handle big files piece by piece.
Setting column data types smaller than default saves memory.
Examples
This reads the CSV file in parts of 5000 rows at a time, so it never loads the whole file at once.
Data Analysis Python
import pandas as pd # Read CSV in small parts for chunk in pd.read_csv('data.csv', chunksize=5000): print(chunk.shape)
Here, we tell pandas to use smaller data types for columns to save memory.
Data Analysis Python
import pandas as pd dtypes = {'age': 'int8', 'salary': 'float32'} df = pd.read_csv('employees.csv', dtype=dtypes) print(df.memory_usage(deep=True))
Changing text columns to 'category' type saves memory by storing repeated values efficiently.
Data Analysis Python
import numpy as np # Convert a column to category type df['city'] = df['city'].astype('category') print(df.memory_usage(deep=True))
Sample Program
This program creates a big table with 1 million rows. It shows memory used before and after changing data types to save memory.
Data Analysis Python
import pandas as pd import numpy as np # Create a large DataFrame with default types size = 1000000 names = ['Alice', 'Bob', 'Charlie', 'David'] # Default object type for names uses more memory df = pd.DataFrame({ 'id': np.arange(size), 'age': np.random.randint(18, 70, size), 'name': np.random.choice(names, size) }) print('Memory usage before optimization:') print(df.memory_usage(deep=True)) # Convert 'name' column to category to save memory df['name'] = df['name'].astype('category') # Change 'age' to smaller integer type df['age'] = df['age'].astype('int8') print('\nMemory usage after optimization:') print(df.memory_usage(deep=True))
OutputSuccess
Important Notes
Using smaller data types can cause errors if values don't fit, so choose carefully.
Converting text columns to 'category' is very effective when many repeated values exist.
Reading data in chunks is useful when you cannot load all data at once.
Summary
Memory-efficient operations help handle big data on limited memory.
Use chunked reading and smaller data types to save memory.
Converting text columns to 'category' type reduces memory for repeated values.