0
0
NumPydata~5 mins

Set operations on structured data in NumPy

Choose your learning style9 modes available
Introduction

Set operations help you find common or different items between groups of data. For structured data, this means comparing rows with multiple fields.

You want to find common records between two tables of data.
You need to find records in one dataset but not in another.
You want to combine two datasets without duplicates.
You want to find records that are unique to each dataset.
Syntax
NumPy
numpy.intersect1d(array1, array2)
numpy.union1d(array1, array2)
numpy.setdiff1d(array1, array2)
numpy.setxor1d(array1, array2)

These functions work on 1D arrays, so for structured data, you often view rows as single items.

Structured arrays have named fields, so you can compare rows as tuples.

Examples
This finds rows that appear in both arrays.
NumPy
import numpy as np

# Define two structured arrays
arr1 = np.array([(1, 'A'), (2, 'B'), (3, 'C')], dtype=[('id', 'i4'), ('label', 'U1')])
arr2 = np.array([(2, 'B'), (3, 'C'), (4, 'D')], dtype=arr1.dtype)

# Find common rows
common = np.intersect1d(arr1, arr2)
print(common)
This finds rows in arr1 that are not in arr2.
NumPy
unique_to_arr1 = np.setdiff1d(arr1, arr2)
print(unique_to_arr1)
This combines both arrays without duplicates.
NumPy
all_unique = np.union1d(arr1, arr2)
print(all_unique)
This finds rows that are in one array or the other but not both.
NumPy
diff = np.setxor1d(arr1, arr2)
print(diff)
Sample Program

This program shows how to use set operations on structured arrays to find common, unique, combined, and different rows.

NumPy
import numpy as np

# Create two structured arrays with fields 'id' and 'score'
arr1 = np.array([(1, 90), (2, 85), (3, 88)], dtype=[('id', 'i4'), ('score', 'i4')])
arr2 = np.array([(2, 85), (3, 88), (4, 92)], dtype=arr1.dtype)

# Find common rows
common = np.intersect1d(arr1, arr2)
print('Common rows:')
print(common)

# Find rows unique to arr1
unique_arr1 = np.setdiff1d(arr1, arr2)
print('\nRows unique to arr1:')
print(unique_arr1)

# Combine all unique rows
all_unique = np.union1d(arr1, arr2)
print('\nAll unique rows combined:')
print(all_unique)

# Find rows in either arr1 or arr2 but not both
diff = np.setxor1d(arr1, arr2)
print('\nRows in either arr1 or arr2 but not both:')
print(diff)
OutputSuccess
Important Notes

Structured arrays compare rows as whole records, so all fields must match to be considered equal.

Set operations return sorted results by default.

If you want to compare only some fields, extract those fields first.

Summary

Set operations help compare structured data by rows.

Use numpy functions like intersect1d, union1d, setdiff1d, and setxor1d.

These operations are useful to find common, unique, or different records.