How to Fix Memory Error in Pandas: Simple Solutions
A
MemoryError in pandas happens when your data is too large for your computer's memory. To fix it, load data in smaller chunks or reduce memory usage by changing data types with pd.read_csv(..., dtype=...) or chunksize parameters.Why This Happens
A MemoryError occurs when pandas tries to load or process data that is too big to fit into your computer's RAM. This often happens with large CSV files or big DataFrames that require more memory than available.
python
import pandas as pd data = pd.read_csv('large_file.csv')
Output
MemoryError: Unable to allocate array with shape (large_number,) and data type float64
The Fix
To fix this, load the data in smaller parts using the chunksize parameter or reduce memory by specifying smaller data types. This way, pandas only loads manageable pieces or uses less memory per column.
python
import pandas as pd chunks = pd.read_csv('large_file.csv', chunksize=10000) for chunk in chunks: # Process each chunk separately print(chunk.head())
Output
column1 column2 column3
0 10 20 30
1 11 21 31
2 12 22 32
3 13 23 33
4 14 24 34
Prevention
To avoid memory errors, always check your data size before loading. Use dtype to assign smaller data types like float32 or int8 when possible. Also, consider filtering columns or rows before loading all data.
- Use
chunksizefor large files. - Convert columns to smaller types with
astype(). - Drop unnecessary columns early.
Related Errors
Other errors related to memory in pandas include:
- ValueError: Buffer dtype mismatch - caused by incompatible data types.
- Performance slowdown - caused by inefficient data types or large data without chunking.
- System freeze - caused by trying to load too much data at once.
Fixes usually involve optimizing data types and using chunking.
Key Takeaways
MemoryError happens when data is too large for your RAM.
Use pd.read_csv with chunksize to load data in smaller parts.
Reduce memory by specifying smaller data types with dtype.
Drop unused columns before loading to save memory.
Always check data size and optimize before processing.