Overview - Wilcoxon signed-rank test

What is it?

The Wilcoxon signed-rank test is a way to compare two related sets of numbers to see if they are different. It is used when you cannot assume the data follows a normal distribution. Instead of looking at the raw numbers, it looks at the differences and their ranks. This test helps decide if one set tends to have higher or lower values than the other.

Why it matters

Sometimes data does not follow the usual bell curve shape, so common tests like the t-test don't work well. The Wilcoxon signed-rank test solves this by not relying on that assumption. Without it, we might wrongly say two related groups are the same or different, leading to bad decisions in medicine, business, or science.

Where it fits

Before learning this, you should understand basic statistics concepts like paired data and hypothesis testing. After this, you can explore other non-parametric tests or learn how to interpret test results in real-world studies.

Mental Model

Core Idea

The Wilcoxon signed-rank test checks if the median difference between paired observations is zero by ranking the absolute differences and considering their signs.

Think of it like...

Imagine you and a friend taste two versions of the same recipe. Instead of just saying which one you like better, you both rate how much better or worse each version is compared to the other. Then, you rank these differences by how strong your feelings are, not just by the raw scores.

Paired Data Differences
┌───────────────┐
│ Pair 1: A1, B1│
│ Difference = D1│
├───────────────┤
│ Pair 2: A2, B2│
│ Difference = D2│
├───────────────┤
│ ...           │
├───────────────┤
│ Pair n: An, Bn│
│ Difference = Dn│
└───────────────┘

Process:
1. Calculate differences D = B - A
2. Remove zero differences
3. Rank absolute values |D|
4. Sum ranks for positive and negative differences
5. Test if sums differ significantly

Build-Up - 6 Steps

1

FoundationUnderstanding Paired Data Differences

Concept: Paired data means each item in one group matches exactly one item in the other group.

Imagine measuring blood pressure before and after a treatment for the same patients. Each patient's before and after readings form a pair. The difference between these paired values is what we analyze.

Result

You get a list of differences, one per pair, showing how much each subject changed.

Understanding paired differences is key because the test focuses on these differences, not the raw values.

2

FoundationWhy Use Non-Parametric Tests

3

IntermediateCalculating Signed Ranks

4

IntermediatePerforming the Wilcoxon Test in SciPy

5

AdvancedHandling Zero Differences and Ties

6

ExpertExact vs Approximate p-values and Continuity Correction

Under the Hood

The test calculates differences between paired samples, ranks their absolute values, and sums ranks separately for positive and negative differences. The test statistic is the smaller of these sums. Under the null hypothesis, the distribution of this statistic is known, allowing calculation of p-values either exactly (small samples) or approximately (large samples).

Why designed this way?

The Wilcoxon signed-rank test was created to provide a non-parametric alternative to the paired t-test, avoiding assumptions about normality. Ranking differences reduces sensitivity to outliers and skewed data, making the test robust and widely applicable.

Paired Samples
┌───────────────┐
│ Sample A: x1 │
│ Sample B: y1 │
│ Difference: d1 = y1 - x1 │
├───────────────┤
│ Sample A: x2 │
│ Sample B: y2 │
│ Difference: d2 = y2 - x2 │
├───────────────┤
│ ...           │
├───────────────┤
│ Sample A: xn │
│ Sample B: yn │
│ Difference: dn = yn - xn │
└───────────────┘

Process:
1. Remove zero differences
2. Rank |d|
3. Assign signs to ranks
4. Sum positive ranks (W+)
5. Sum negative ranks (W-)
6. Test statistic = min(W+, W-)
7. Calculate p-value from distribution

Myth Busters - 4 Common Misconceptions

Quick: Does the Wilcoxon signed-rank test compare raw values or differences? Commit to your answer.

Common Belief:People often think the test compares the original paired values directly.

Tap to reveal reality

Quick: Is the Wilcoxon test only for large samples? Commit to your answer.

Common Belief:Some believe the test requires large samples to be valid.

Tap to reveal reality

Quick: Does the test assume data is normally distributed? Commit to your answer.

Common Belief:Many think the Wilcoxon signed-rank test assumes normality like the t-test.

Tap to reveal reality

Quick: Should zero differences be included in the test? Commit to your answer.

Common Belief:Some think zero differences should be ranked and included.

Tap to reveal reality

Expert Zone

1

The test statistic's distribution changes with sample size, so exact p-values are preferred for small samples but computationally expensive for large ones.

2

Continuity correction in normal approximation improves p-value accuracy but can slightly change results, so its use depends on context.

3

Handling ties by averaging ranks preserves fairness but can slightly affect test power and interpretation.

When NOT to use

Avoid using the Wilcoxon signed-rank test when data pairs are not related or when differences are not symmetric around the median. Instead, use the Mann-Whitney U test for independent samples or sign test if only direction matters.

Production Patterns

In real-world studies, the Wilcoxon test is often used in clinical trials to compare before-and-after treatment effects when data is skewed. It is also common in A/B testing when assumptions for parametric tests fail. Automated pipelines use SciPy's implementation with checks for zero differences and ties.

Connections

Paired t-test

Alternative test for paired data assuming normality

Understanding the Wilcoxon test clarifies when to choose non-parametric methods over parametric ones like the paired t-test.

Sign test

Simpler non-parametric test focusing only on direction of differences

Knowing the Wilcoxon test helps appreciate the added power gained by considering ranks, not just signs.

Rank-based methods in economics

Uses ranking of data to reduce impact of outliers and non-normality

Seeing Wilcoxon as a rank-based method connects statistics to economic models that rely on ranks to handle irregular data.

Common Pitfalls

#1Including zero differences in ranking

Wrong approach:differences = [0, 2, -3, 0, 5] ranks = rank(abs(differences)) # includes zeros

Correct approach:differences = [2, -3, 5] # zeros removed ranks = rank(abs(differences))

Root cause:Misunderstanding that zero differences provide no information about direction or magnitude.

#2Using Wilcoxon test on independent samples

Wrong approach:scipy.stats.wilcoxon(sample1, sample2) # samples are independent

Correct approach:scipy.stats.mannwhitneyu(sample1, sample2) # for independent samples

Root cause:Confusing paired and independent sample tests.

#3Ignoring ties in absolute differences

Wrong approach:Assign ranks ignoring ties, e.g., rank 1, 2, 3 for values [1,1,2]

Correct approach:Assign average ranks for ties, e.g., ranks 1.5, 1.5, 3 for values [1,1,2]

Root cause:Not knowing how to handle tied ranks properly.

Key Takeaways

The Wilcoxon signed-rank test compares paired data by ranking the absolute differences and considering their signs.

It is a non-parametric test that does not assume normal distribution, making it useful for skewed or small datasets.

Zero differences are excluded, and ties are handled by averaging ranks to maintain fairness.

SciPy's implementation provides exact p-values for small samples and approximate p-values with continuity correction for larger samples.

Choosing the Wilcoxon test over parametric alternatives depends on data characteristics and study design.