Record arrays in NumPy - Time & Space Complexity
We want to understand how the time to access and manipulate data in numpy record arrays changes as the data size grows.
Specifically, how does the cost grow when working with structured data stored in record arrays?
Analyze the time complexity of the following code snippet.
import numpy as np
# Define a record array with 3 fields
data = np.recarray(1000, dtype=[('name', 'U10'), ('age', 'i4'), ('score', 'f4')])
# Access the 'age' field for all records
ages = data.age
# Compute the average age
average_age = np.mean(ages)
This code creates a record array with 1000 entries, accesses one field for all records, and calculates the average of that field.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Accessing the 'age' field for all 1000 records and computing the mean.
- How many times: The operation touches each record once, so 1000 times.
As the number of records grows, the time to access and process each record grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations to access and process |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: The operations grow linearly with the number of records.
Time Complexity: O(n)
This means the time to access and compute over the record array grows directly in proportion to the number of records.
[X] Wrong: "Accessing a field in a record array is instant and does not depend on the number of records."
[OK] Correct: Even though the field access looks simple, numpy must read each record's field, so the time grows with the number of records.
Understanding how structured data access scales helps you reason about performance in real data tasks and shows you can analyze array operations clearly.
"What if we accessed multiple fields at once instead of just one? How would the time complexity change?"