Correlation with np.correlate() in NumPy - Time & Space Complexity
We want to understand how the time it takes to compute correlation with numpy's np.correlate() changes as the input size grows.
Specifically, how does the work increase when the arrays get longer?
Analyze the time complexity of the following code snippet.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([0, 1, 0.5])
result = np.correlate(x, y, mode='full')
This code calculates the correlation between two arrays, sliding one over the other and multiplying overlapping elements.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Multiplying and summing overlapping elements of the two arrays for each shift position.
- How many times: For each position, it multiplies elements up to the length of the smaller array, repeated for each possible shift (about n + m - 1 times).
As the arrays get longer, the number of multiply-and-sum steps grows roughly with the product of their lengths.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 100 multiply-and-sum steps |
| 100 | About 10,000 multiply-and-sum steps |
| 1000 | About 1,000,000 multiply-and-sum steps |
Pattern observation: Doubling the input size roughly quadruples the work, showing a growth related to multiplying the sizes.
Time Complexity: O(n * m)
This means the time to compute correlation grows roughly with the product of the lengths of the two input arrays.
[X] Wrong: "The correlation time grows only with the length of one array, not both."
[OK] Correct: The calculation involves sliding one array over the other, so both lengths affect how many multiply-and-sum steps happen.
Understanding how correlation scales helps you reason about performance when working with signals or time series data, a common task in data science and engineering.
"What if we used np.correlate() with two arrays of equal length n, but one array was mostly zeros? How would the time complexity change?"