Spearman correlation in SciPy - Time & Space Complexity
We want to know how the time needed to calculate Spearman correlation changes as the data size grows.
How does the work increase when we have more data points?
Analyze the time complexity of the following code snippet.
import numpy as np
from scipy.stats import spearmanr
n = 1000 # example size
# Generate two random arrays
x = np.random.rand(n)
y = np.random.rand(n)
# Calculate Spearman correlation
corr, p_value = spearmanr(x, y)
This code calculates the Spearman correlation between two lists of numbers of size n.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting both arrays to get their ranks.
- How many times: Sorting takes about n log n steps for each array.
As the number of data points increases, the sorting step takes more time, growing a bit faster than just the number of points.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 steps |
| 100 | About 700 steps |
| 1000 | About 10,000 steps |
Pattern observation: The time grows faster than the input size but slower than its square, roughly like n times log n.
Time Complexity: O(n log n)
This means the time to compute Spearman correlation grows a bit faster than the number of data points because of sorting.
[X] Wrong: "Calculating Spearman correlation takes only linear time because it just compares pairs."
[OK] Correct: Sorting the data to find ranks is needed first, and sorting takes more than linear time.
Knowing how Spearman correlation scales helps you explain performance when working with large datasets, showing you understand both statistics and efficiency.
"What if the input arrays were already sorted? How would the time complexity change?"