0
0
NumPydata~3 mins

Why Memory-mapped arrays for large data in NumPy? - Purpose & Use Cases

Choose your learning style9 modes available
The Big Idea

What if you could explore giant datasets without your computer freezing or crashing?

The Scenario

Imagine you have a huge spreadsheet with millions of rows. You want to analyze it, but your computer's memory is too small to open the whole file at once.

You try to load it all into your program, but it crashes or becomes very slow.

The Problem

Loading all data into memory at once is slow and can cause your program to crash.

It wastes time waiting for the computer to swap data in and out of memory.

Manual splitting or chunking is complicated and error-prone.

The Solution

Memory-mapped arrays let you work with large data files as if they were in memory, but only load small parts when needed.

This saves memory and speeds up processing without crashing your program.

Before vs After
Before
import numpy as np
data = np.load('large_file.npy')  # loads entire file into memory
After
import numpy as np
data = np.memmap('large_file.npy', dtype='float32', mode='r', shape=(1000000, 10))  # loads data on demand
What It Enables

You can analyze huge datasets on a normal computer without running out of memory or waiting forever.

Real Life Example

A data scientist working with terabytes of sensor data can quickly access and analyze parts of the data without loading everything at once.

Key Takeaways

Loading huge data fully into memory can crash or slow down programs.

Memory-mapped arrays load data only when needed, saving memory and time.

This technique makes working with very large datasets possible on normal computers.