0
0
NumPydata~10 mins

Memory-mapped arrays for large data in NumPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Memory-mapped arrays for large data
Create or open large file
Memory-map file as numpy array
Access array data directly from disk
Read or write parts of array
Changes saved to disk automatically
Close file
This flow shows how a large file is memory-mapped as a numpy array, allowing direct access to data on disk without loading all into memory.
Execution Sample
NumPy
import numpy as np
filename = 'large_data.dat'
# Create memmap array
arr = np.memmap(filename, dtype='float32', mode='w+', shape=(3,3))
arr[:] = np.arange(9).reshape(3,3)
arr.flush()
This code creates a 3x3 memory-mapped array, writes numbers 0 to 8, and saves changes to disk.
Execution Table
StepActionArray ContentDisk WriteNotes
1Create memmap array with zeros[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]NoArray backed by file, initially zeros
2Assign values 0 to 8 reshaped[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]NoValues set in memmap array in memory
3Flush changes to disk[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]YesData written to file on disk
4Close memmap object[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]NoFile closed, data saved
💡 All data written to disk and memmap closed
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4
arrNone[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]][[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]][[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]Closed
Key Moments - 3 Insights
Why doesn't assigning values to the memmap array immediately write to disk?
Because changes are first made in memory for efficiency. The flush() method explicitly writes changes to disk, as shown in step 3 of the execution_table.
What happens if the file backing the memmap does not exist when mode='r+' is used?
An error occurs because mode='r+' requires the file to exist. You must create the file first or use mode='w+' to create a new file, as implied in step 1.
How does memmap help with large data compared to normal numpy arrays?
Memmap accesses data directly from disk without loading all into RAM, so it can handle data larger than memory. This is shown by the array content being backed by a file in step 1.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2, what is the value of arr[1,2]?
A5.0
B3.0
C8.0
D0.0
💡 Hint
Check the array content column at step 2 in execution_table
At which step are the changes saved to disk?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at the Disk Write column in execution_table
If we skip calling flush(), what would happen to the data on disk?
AData on disk updates automatically
BData on disk remains unchanged
CFile is deleted
DProgram crashes
💡 Hint
Refer to key_moments about when data is written to disk
Concept Snapshot
Memory-mapped arrays let numpy access large data files on disk as arrays.
Use np.memmap(filename, dtype, mode, shape) to create.
Changes are made in memory and saved with flush().
Good for data too big for RAM.
Access like normal numpy arrays.
Full Transcript
Memory-mapped arrays allow numpy to work with large data files by mapping them directly to arrays without loading all data into memory. You create a memmap array by specifying a filename, data type, mode, and shape. Initially, the array reflects the file content or zeros if new. You can assign values to the array in memory. These changes are not immediately saved to disk until you call flush(). Closing the memmap ensures data is saved and file is closed. This method is useful for very large datasets that do not fit in RAM, enabling efficient read and write access directly on disk.