np.std() and np.var() for spread in NumPy - Time & Space Complexity
We want to know how the time to calculate spread measures changes as data grows.
How does the work increase when we use np.std() or np.var() on bigger arrays?
Analyze the time complexity of the following code snippet.
import numpy as np
n = 1000 # example size
arr = np.random.rand(n)
variance = np.var(arr)
std_dev = np.std(arr)
This code creates an array of size n and calculates its variance and standard deviation.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Going through each element of the array to compute mean and then sum squared differences.
- How many times: Each element is visited a few times, but the main cost is one full pass over n elements.
As the array size grows, the time to calculate variance or standard deviation grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: Doubling the input roughly doubles the work needed.
Time Complexity: O(n)
This means the time to compute spread grows linearly with the number of data points.
[X] Wrong: "Calculating variance or standard deviation is instant no matter the data size."
[OK] Correct: The functions must look at every number to find the average and differences, so bigger data means more work.
Understanding how spread calculations scale helps you explain performance when working with large datasets in real projects.
"What if we calculate variance on a 2D array along one axis? How would the time complexity change?"