SciPydata~5 mins

Wilcoxon signed-rank test in SciPy - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Wilcoxon signed-rank test

O(n log n)

Understanding Time Complexity

We want to understand how the time needed to run the Wilcoxon signed-rank test changes as the input data grows.

Specifically, how does the test's execution time grow when we have more paired samples?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import numpy as np
from scipy.stats import wilcoxon

n = 100  # example size
# paired sample data
x = np.random.rand(n)
y = np.random.rand(n)

# perform Wilcoxon signed-rank test
stat, p = wilcoxon(x, y)

This code runs the Wilcoxon signed-rank test on two paired samples of size n.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operations: Computing differences (O(n)), sorting absolute differences for ranking (O(n log n)), and summing signed ranks (O(n)).
How many times: Differences and sums are O(n), but ranking dominates due to sorting.

How Execution Grows With Input

As the number of paired samples n increases, the sorting for ranking causes time to grow as O(n log n).

Input Size (n)	Approx. Operations
10	About 33 operations (10 * log2(10) ≈ 33)
100	About 664 operations (100 * log2(100) ≈ 664)
1000	About 9970 operations (1000 * log2(1000) ≈ 9970)

Pattern observation: The work grows roughly as n log n as n grows.

Final Time Complexity

Time Complexity: O(n log n)

This is dominated by the sorting step required for ranking the absolute differences in SciPy's implementation.

Common Mistake

[X] Wrong: "The Wilcoxon test is O(n) since we just process each pair once."

[OK] Correct: Ranking requires sorting the absolute differences, which is O(n log n).

Interview Connect

Understanding that statistical tests like Wilcoxon involve sorting helps evaluate scalability for large datasets.

Self-Check

"What if the input data were already sorted by difference magnitude? How would that affect the time complexity?"