Correlation (correlate) in SciPy - Time & Space Complexity
We want to understand how the time needed to find correlation grows as the input data gets bigger.
Specifically, how does the work change when we use scipy's correlate function on longer arrays?
Analyze the time complexity of the following code snippet.
import numpy as np
from scipy.signal import correlate
x = np.random.rand(n)
y = np.random.rand(n)
result = correlate(x, y, mode='full')
This code computes the correlation between two arrays of length n using scipy's correlate function.
- Primary operation: Multiplying and summing pairs of elements from the two arrays at different shifts.
- How many times: For each of the roughly 2n-1 shifts, it multiplies and sums up to n elements.
As the input size n grows, the number of operations grows roughly with the square of n.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 100 |
| 100 | About 10,000 |
| 1000 | About 1,000,000 |
Pattern observation: Doubling the input size roughly quadruples the work needed.
Time Complexity: O(n²)
This means the time needed grows roughly with the square of the input size.
[X] Wrong: "Correlation runs in linear time because it just looks at each element once."
[OK] Correct: Correlation compares every element of one array with every element of the other, so the work grows much faster than just looking once.
Understanding how correlation scales helps you explain performance when working with signals or time series data in real projects.
"What if we used FFT-based correlation instead? How would the time complexity change?"