0
0
NumPydata~5 mins

Masked arrays concept in NumPy

Choose your learning style9 modes available
Introduction

Masked arrays help you work with data that has missing or invalid values. They let you ignore these values in calculations without deleting them.

You have a dataset with some missing numbers and want to calculate the average without errors.
You want to hide or skip invalid data points in a large array during analysis.
You need to perform operations on data but want to keep track of which values are not valid.
You want to plot data but exclude certain points without removing them from the dataset.
Syntax
NumPy
import numpy as np

# Create a masked array
masked_array = np.ma.array(data, mask=mask_array)

# data: normal numpy array or list
# mask_array: boolean array where True means the value is masked (ignored)

The mask array must be the same shape as the data array.

Masked values are ignored in calculations like mean, sum, etc.

Examples
This masks the 2nd and 5th values, so they are ignored.
NumPy
import numpy as np

# Example 1: Masking some values
values = np.array([1, 2, 3, 4, 5])
mask = np.array([False, True, False, False, True])
masked_values = np.ma.array(values, mask=mask)
print(masked_values)
No mask means all values are used.
NumPy
import numpy as np

# Example 2: Masked array with all values valid (no mask)
values = np.array([10, 20, 30])
masked_values = np.ma.array(values)
print(masked_values)
All values are masked, so the array shows as all masked.
NumPy
import numpy as np

# Example 3: Masked array with all values masked
values = np.array([7, 8, 9])
mask = np.array([True, True, True])
masked_values = np.ma.array(values, mask=mask)
print(masked_values)
This masks the first and last elements only.
NumPy
import numpy as np

# Example 4: Masking first and last elements
values = np.array([100, 200, 300, 400])
mask = np.array([True, False, False, True])
masked_values = np.ma.array(values, mask=mask)
print(masked_values)
Sample Program

This program creates a masked array to ignore negative values in calculations. It prints the original data, the mask, the masked array, and the mean ignoring invalid values.

NumPy
import numpy as np

# Create a normal numpy array with some invalid data
data = np.array([10, -1, 20, -999, 30, 40])

# Define a mask where invalid data is True
# Here, we consider negative values as invalid
mask_invalid = data < 0

# Create a masked array
masked_data = np.ma.array(data, mask=mask_invalid)

print("Original data:", data)
print("Mask for invalid data:", mask_invalid)
print("Masked array:", masked_data)

# Calculate mean ignoring masked values
mean_value = masked_data.mean()
print(f"Mean ignoring invalid values: {mean_value}")
OutputSuccess
Important Notes

Time complexity for creating a masked array is O(n), where n is the number of elements.

Space complexity is O(n) because the mask array stores a boolean for each element.

A common mistake is not matching the mask shape to the data shape, which causes errors.

Use masked arrays when you want to keep invalid data but exclude it from calculations. Use data cleaning if you want to remove invalid data completely.

Summary

Masked arrays let you mark data as invalid without deleting it.

They help perform calculations ignoring invalid or missing values.

Always ensure the mask matches the data shape and use masked arrays to keep data integrity.