Memory-mapped files with np.memmap in NumPy - Time & Space Complexity
When working with large data files, memory-mapped files let us access data without loading it all at once.
We want to know how the time to access data grows as the file size increases.
Analyze the time complexity of the following code snippet.
import numpy as np
# Create a memory-mapped file for a large array
filename = 'large_array.dat'
shape = (10000, 10000)
# Open memmap in read-write mode
mmap_array = np.memmap(filename, dtype='float64', mode='w+', shape=shape)
# Access a single element
value = mmap_array[5000, 5000]
# Modify a slice
mmap_array[1000:2000, 1000:2000] = 1.0
# Flush changes to disk
mmap_array.flush()
This code creates a large memory-mapped array, reads one element, modifies a slice, and saves changes.
Look at what repeats or takes time in this code.
- Primary operation: Accessing and modifying parts of the large array stored on disk.
- How many times: Reading one element is a single operation; modifying a slice touches many elements (here 1000 x 1000 = 1,000,000 elements).
Accessing one element stays quick no matter the file size.
Modifying a slice grows with the number of elements changed.
| Input Size (n x n) | Approx. Operations for Slice Modification |
|---|---|
| 2,000 x 2,000 | 1,000,000 |
| 10,000 x 10,000 | 1,000,000 |
| 50,000 x 50,000 | 1,000,000 |
Pattern observation: The time to modify stays roughly constant regardless of the total file size, growing linearly with the number of elements changed (area of the slice).
Time Complexity: O(k)
This means the time grows linearly with the number of elements accessed or modified, not the total file size.
[X] Wrong: "Accessing any part of a memory-mapped file always takes time proportional to the whole file size."
[OK] Correct: Memory-mapped files let you access small parts quickly without reading the entire file, so time depends on how much you touch, not the full size.
Understanding how memory-mapped files work helps you handle big data efficiently, a useful skill in many data science tasks.
What if we changed the slice size to the entire array? How would the time complexity change?