0
0
NumPydata~5 mins

Memory-mapped arrays for large data in NumPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Memory-mapped arrays for large data
O(k)
Understanding Time Complexity

When working with very large data files, memory-mapped arrays let us access data without loading it all at once.

We want to understand how the time to read or write data grows as the file size increases.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import numpy as np

# Create or open a memory-mapped file
filename = 'large_data.dat'
mm_array = np.memmap(filename, dtype='float64', mode='r+', shape=(1000000,))

# Access a slice of the array
slice_data = mm_array[1000:2000]

# Modify a value
mm_array[500] = 42.0

# Flush changes to disk
mm_array.flush()

This code opens a large file as a memory-mapped array, reads a slice, modifies one element, and saves changes.

Identify Repeating Operations

Look for repeated actions that take time as data size grows.

  • Primary operation: Accessing or modifying elements in the memory-mapped array.
  • How many times: Depends on how many elements are read or written; each element access triggers a disk read or write if not cached.
How Execution Grows With Input

Accessing a small slice reads only that part from disk, so time grows with the slice size, not the whole file.

Input Size (n)Approx. Operations
10 elementsAbout 10 disk reads
100 elementsAbout 100 disk reads
1000 elementsAbout 1000 disk reads

Pattern observation: Time grows roughly linearly with the number of elements accessed, not the total file size.

Final Time Complexity

Time Complexity: O(k) where k is the number of elements accessed or modified.

This means the time depends on how much data you actually read or write, not the total size of the large file.

Common Mistake

[X] Wrong: "Accessing any part of a memory-mapped array always takes time proportional to the whole file size."

[OK] Correct: Memory mapping loads only the accessed parts into memory, so time depends on the accessed slice size, not the entire file.

Interview Connect

Understanding memory-mapped arrays shows you can handle big data efficiently by reading only what you need, a useful skill in real data science work.

Self-Check

What if we changed the access pattern to read the entire array sequentially? How would the time complexity change?