0
0
NumPydata~5 mins

Memory-mapped arrays for large data in NumPy

Choose your learning style9 modes available
Introduction

Memory-mapped arrays let you work with very big data files without loading everything into memory. This helps when your data is too large to fit in your computer's RAM.

You have a huge dataset saved on disk and want to analyze parts of it without loading all at once.
You want to share large data between programs without copying it into memory multiple times.
You need to process data that is bigger than your computer's available RAM.
You want faster access to parts of a large file without reading the entire file.
You want to save memory when working with large arrays in scientific computing.
Syntax
NumPy
import numpy as np

# Create or load a memory-mapped array
memmap_array = np.memmap(filename, dtype=data_type, mode='r+', shape=(rows, columns))

filename is the path to the file on disk.

dtype is the data type of the array elements (e.g., np.float64).

mode can be 'r' (read-only), 'r+' (read-write), 'w+' (create or overwrite), or 'c' (copy-on-write).

shape defines the dimensions of the array.

Examples
This creates a new 1000x1000 array on disk filled with zeros.
NumPy
import numpy as np

# Example 1: Create a new memory-mapped array file
memmap_array = np.memmap('data.dat', dtype=np.float64, mode='w+', shape=(1000, 1000))
memmap_array[:] = 0  # Initialize with zeros
memmap_array.flush()  # Save changes to disk
This opens the existing file without allowing changes.
NumPy
import numpy as np

# Example 2: Load an existing memory-mapped array in read-only mode
memmap_array = np.memmap('data.dat', dtype=np.float64, mode='r', shape=(1000, 1000))
print(memmap_array[0, 0])
Trying to open a non-existing file in read-write mode causes an error.
NumPy
import numpy as np

# Example 3: Edge case - file does not exist with mode='r+'
try:
    memmap_array = np.memmap('missing.dat', dtype=np.float64, mode='r+', shape=(10, 10))
except FileNotFoundError as error:
    print('File not found:', error)
Memory-mapped array with only one element works fine.
NumPy
import numpy as np

# Example 4: Edge case - single element array
memmap_array = np.memmap('single_element.dat', dtype=np.int32, mode='w+', shape=(1,))
memmap_array[0] = 42
memmap_array.flush()
print(memmap_array[0])
Sample Program

This program creates a large memory-mapped array file, initializes it, modifies a small block, and then reads it back in read-only mode to confirm the changes.

NumPy
import numpy as np
import os

filename = 'large_data.dat'

# Step 1: Create a large memory-mapped array file (100x100) filled with zeros
if os.path.exists(filename):
    os.remove(filename)  # Remove if exists to start fresh

large_memmap = np.memmap(filename, dtype=np.float64, mode='w+', shape=(100, 100))
large_memmap[:] = 0
large_memmap.flush()

print('Initial sum of all elements:', large_memmap.sum())

# Step 2: Modify a small part of the array
large_memmap[10:15, 10:15] = 5.5
large_memmap.flush()

print('Sum after modification:', large_memmap.sum())

# Step 3: Load the same file in read-only mode and check values
loaded_memmap = np.memmap(filename, dtype=np.float64, mode='r', shape=(100, 100))
print('Value at (12, 12):', loaded_memmap[12, 12])
print('Sum from loaded array:', loaded_memmap.sum())
OutputSuccess
Important Notes

Time complexity: Accessing or modifying elements is fast and similar to normal arrays, but initial file creation depends on file size.

Space complexity: Uses disk space equal to the array size; RAM usage is low because data is loaded on demand.

Common mistake: Forgetting to call flush() to save changes to disk when using write modes.

Use memory-mapped arrays when data is too large for RAM. For small data, normal numpy arrays are simpler and faster.

Summary

Memory-mapped arrays let you work with large data files without loading all data into memory.

They behave like normal numpy arrays but store data on disk, saving RAM.

Remember to flush changes to save them, and handle file existence carefully.