Wilcoxon signed-rank test in SciPy - Time & Space Complexity
We want to understand how the time needed to run the Wilcoxon signed-rank test changes as the input data grows.
Specifically, how does the test's execution time grow when we have more paired samples?
Analyze the time complexity of the following code snippet.
import numpy as np
from scipy.stats import wilcoxon
n = 100 # example size
# paired sample data
x = np.random.rand(n)
y = np.random.rand(n)
# perform Wilcoxon signed-rank test
stat, p = wilcoxon(x, y)
This code runs the Wilcoxon signed-rank test on two paired samples of size n.
Identify the loops, recursion, array traversals that repeat.
- Primary operations: Computing differences (O(n)), sorting absolute differences for ranking (O(n log n)), and summing signed ranks (O(n)).
- How many times: Differences and sums are O(n), but ranking dominates due to sorting.
As the number of paired samples n increases, the sorting for ranking causes time to grow as O(n log n).
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 33 operations (10 * log2(10) ≈ 33) |
| 100 | About 664 operations (100 * log2(100) ≈ 664) |
| 1000 | About 9970 operations (1000 * log2(1000) ≈ 9970) |
Pattern observation: The work grows roughly as n log n as n grows.
Time Complexity: O(n log n)
This is dominated by the sorting step required for ranking the absolute differences in SciPy's implementation.
[X] Wrong: "The Wilcoxon test is O(n) since we just process each pair once."
[OK] Correct: Ranking requires sorting the absolute differences, which is O(n log n).
Understanding that statistical tests like Wilcoxon involve sorting helps evaluate scalability for large datasets.
"What if the input data were already sorted by difference magnitude? How would that affect the time complexity?"