NumPydata~10 mins

Memory-mapped arrays for large data in NumPy - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Memory-mapped arrays for large data

Create or open large file

↓

Memory-map file as numpy array

↓

Access array data directly from disk

↓

Read or write parts of array

↓

Changes saved to disk automatically

↓

Close file

This flow shows how a large file is memory-mapped as a numpy array, allowing direct access to data on disk without loading all into memory.

Execution Sample

NumPy

import numpy as np
filename = 'large_data.dat'
# Create memmap array
arr = np.memmap(filename, dtype='float32', mode='w+', shape=(3,3))
arr[:] = np.arange(9).reshape(3,3)
arr.flush()

This code creates a 3x3 memory-mapped array, writes numbers 0 to 8, and saves changes to disk.

Execution Table

Step	Action	Array Content	Disk Write	Notes
1	Create memmap array with zeros	[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]	No	Array backed by file, initially zeros
2	Assign values 0 to 8 reshaped	[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]	No	Values set in memmap array in memory
3	Flush changes to disk	[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]	Yes	Data written to file on disk
4	Close memmap object	[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]	No	File closed, data saved

💡 All data written to disk and memmap closed

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4
arr	None	[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]	[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]	[[0. 1. 2.] [3. 4. 5.] [6. 7. 8.]]	Closed

Key Moments - 3 Insights

Why doesn't assigning values to the memmap array immediately write to disk?

What happens if the file backing the memmap does not exist when mode='r+' is used?

How does memmap help with large data compared to normal numpy arrays?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 2, what is the value of arr[1,2]?

A5.0

B3.0

C8.0

D0.0

Concept Snapshot

Memory-mapped arrays let numpy access large data files on disk as arrays.
Use np.memmap(filename, dtype, mode, shape) to create.
Changes are made in memory and saved with flush().
Good for data too big for RAM.
Access like normal numpy arrays.

Full Transcript

Memory-mapped arrays allow numpy to work with large data files by mapping them directly to arrays without loading all data into memory. You create a memmap array by specifying a filename, data type, mode, and shape. Initially, the array reflects the file content or zeros if new. You can assign values to the array in memory. These changes are not immediately saved to disk until you call flush(). Closing the memmap ensures data is saved and file is closed. This method is useful for very large datasets that do not fit in RAM, enabling efficient read and write access directly on disk.