Memory-mapped arrays let you work with very big data files without loading everything into memory. This helps when your data is too large to fit in your computer's RAM.
Memory-mapped arrays for large data in NumPy
import numpy as np # Create or load a memory-mapped array memmap_array = np.memmap(filename, dtype=data_type, mode='r+', shape=(rows, columns))
filename is the path to the file on disk.
dtype is the data type of the array elements (e.g., np.float64).
mode can be 'r' (read-only), 'r+' (read-write), 'w+' (create or overwrite), or 'c' (copy-on-write).
shape defines the dimensions of the array.
import numpy as np # Example 1: Create a new memory-mapped array file memmap_array = np.memmap('data.dat', dtype=np.float64, mode='w+', shape=(1000, 1000)) memmap_array[:] = 0 # Initialize with zeros memmap_array.flush() # Save changes to disk
import numpy as np # Example 2: Load an existing memory-mapped array in read-only mode memmap_array = np.memmap('data.dat', dtype=np.float64, mode='r', shape=(1000, 1000)) print(memmap_array[0, 0])
import numpy as np # Example 3: Edge case - file does not exist with mode='r+' try: memmap_array = np.memmap('missing.dat', dtype=np.float64, mode='r+', shape=(10, 10)) except FileNotFoundError as error: print('File not found:', error)
import numpy as np # Example 4: Edge case - single element array memmap_array = np.memmap('single_element.dat', dtype=np.int32, mode='w+', shape=(1,)) memmap_array[0] = 42 memmap_array.flush() print(memmap_array[0])
This program creates a large memory-mapped array file, initializes it, modifies a small block, and then reads it back in read-only mode to confirm the changes.
import numpy as np import os filename = 'large_data.dat' # Step 1: Create a large memory-mapped array file (100x100) filled with zeros if os.path.exists(filename): os.remove(filename) # Remove if exists to start fresh large_memmap = np.memmap(filename, dtype=np.float64, mode='w+', shape=(100, 100)) large_memmap[:] = 0 large_memmap.flush() print('Initial sum of all elements:', large_memmap.sum()) # Step 2: Modify a small part of the array large_memmap[10:15, 10:15] = 5.5 large_memmap.flush() print('Sum after modification:', large_memmap.sum()) # Step 3: Load the same file in read-only mode and check values loaded_memmap = np.memmap(filename, dtype=np.float64, mode='r', shape=(100, 100)) print('Value at (12, 12):', loaded_memmap[12, 12]) print('Sum from loaded array:', loaded_memmap.sum())
Time complexity: Accessing or modifying elements is fast and similar to normal arrays, but initial file creation depends on file size.
Space complexity: Uses disk space equal to the array size; RAM usage is low because data is loaded on demand.
Common mistake: Forgetting to call flush() to save changes to disk when using write modes.
Use memory-mapped arrays when data is too large for RAM. For small data, normal numpy arrays are simpler and faster.
Memory-mapped arrays let you work with large data files without loading all data into memory.
They behave like normal numpy arrays but store data on disk, saving RAM.
Remember to flush changes to save them, and handle file existence carefully.