Creating structured arrays in NumPy - Performance & Efficiency
We want to understand how the time needed to create structured arrays changes as the data size grows.
How does the work increase when we add more records to the array?
Analyze the time complexity of the following code snippet.
import numpy as np
dtype = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
data = np.zeros(1000, dtype=dtype)
for i in range(1000):
data[i] = ('Alice', i % 100, 55.0 + i * 0.1)
This code creates a structured array with 1000 records and fills each record with data.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The for-loop that assigns values to each element in the structured array.
- How many times: Exactly once for each of the 1000 records.
As the number of records increases, the time to fill the array grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 assignments |
| 100 | About 100 assignments |
| 1000 | About 1000 assignments |
Pattern observation: Doubling the number of records roughly doubles the work done.
Time Complexity: O(n)
This means the time to create and fill the structured array grows linearly with the number of records.
[X] Wrong: "Creating a structured array is instant and does not depend on size."
[OK] Correct: Each record must be assigned data, so the time grows with the number of records.
Understanding how data creation scales helps you explain performance in real data tasks, a useful skill in interviews and projects.
"What if we used vectorized assignment instead of a for-loop? How would the time complexity change?"