Large files can be too big to load all at once. Working efficiently helps save memory and time.
0
0
Working with large files efficiently in NumPy
Introduction
You have a huge dataset that does not fit into your computer's memory.
You want to process data in parts instead of loading everything at once.
You need to read or write large numerical data quickly.
You want to avoid your program crashing due to memory overload.
You want to speed up data analysis by handling data in chunks.
Syntax
NumPy
import numpy as np # Load part of a large file using memory mapping array = np.memmap('filename.dat', dtype='float32', mode='r', shape=(1000, 1000))
np.memmap lets you access small parts of a big file without loading it all.
You specify the data type, mode (read or write), and shape of the data.
Examples
This opens a large file as if it were an array, but only loads parts when needed.
NumPy
import numpy as np # Memory-map a large binary file for reading data = np.memmap('data.bin', dtype='float64', mode='r', shape=(5000, 5000))
This creates a big file and writes numbers from 0 to 9999 without loading all in memory.
NumPy
import numpy as np # Create a new memory-mapped file for writing mmap_array = np.memmap('newfile.dat', dtype='int32', mode='w+', shape=(10000,)) mmap_array[:] = np.arange(10000)
For text files like CSV, read in smaller parts and convert to numpy arrays for processing.
NumPy
import numpy as np import pandas as pd # Read a large CSV file in chunks using pandas and convert to numpy arrays chunks = pd.read_csv('large.csv', chunksize=10000) for chunk in chunks: arr = chunk.to_numpy() # process arr here
Sample Program
This program creates a large file with 1 million numbers from 0 to 1. Then it reads only 10 numbers from the file without loading all data into memory.
NumPy
import numpy as np # Create a large memory-mapped file and write data filename = 'large_data.dat' size = 1000000 # 1 million elements # Create file with zeros mmap_array = np.memmap(filename, dtype='float32', mode='w+', shape=(size,)) mmap_array[:] = np.linspace(0, 1, size) # Flush changes to disk mmap_array.flush() # Now read only a slice without loading entire file mmap_read = np.memmap(filename, dtype='float32', mode='r', shape=(size,)) slice_data = mmap_read[100000:100010] print(slice_data)
OutputSuccess
Important Notes
Memory mapping works best with binary files, not text files.
Always specify the correct data type and shape to avoid errors.
Use chunk reading for large text files like CSVs, then convert to numpy arrays.
Summary
Use np.memmap to work with large binary files without loading all data.
Read or write data in parts to save memory and speed up processing.
For large text files, read in chunks and convert to numpy arrays for analysis.