0
0
SciPydata~15 mins

Wilcoxon signed-rank test in SciPy - Deep Dive

Choose your learning style9 modes available
Overview - Wilcoxon signed-rank test
What is it?
The Wilcoxon signed-rank test is a way to compare two related sets of numbers to see if they are different. It is used when you cannot assume the data follows a normal distribution. Instead of looking at the raw numbers, it looks at the differences and their ranks. This test helps decide if one set tends to have higher or lower values than the other.
Why it matters
Sometimes data does not follow the usual bell curve shape, so common tests like the t-test don't work well. The Wilcoxon signed-rank test solves this by not relying on that assumption. Without it, we might wrongly say two related groups are the same or different, leading to bad decisions in medicine, business, or science.
Where it fits
Before learning this, you should understand basic statistics concepts like paired data and hypothesis testing. After this, you can explore other non-parametric tests or learn how to interpret test results in real-world studies.
Mental Model
Core Idea
The Wilcoxon signed-rank test checks if the median difference between paired observations is zero by ranking the absolute differences and considering their signs.
Think of it like...
Imagine you and a friend taste two versions of the same recipe. Instead of just saying which one you like better, you both rate how much better or worse each version is compared to the other. Then, you rank these differences by how strong your feelings are, not just by the raw scores.
Paired Data Differences
┌───────────────┐
│ Pair 1: A1, B1│
│ Difference = D1│
├───────────────┤
│ Pair 2: A2, B2│
│ Difference = D2│
├───────────────┤
│ ...           │
├───────────────┤
│ Pair n: An, Bn│
│ Difference = Dn│
└───────────────┘

Process:
1. Calculate differences D = B - A
2. Remove zero differences
3. Rank absolute values |D|
4. Sum ranks for positive and negative differences
5. Test if sums differ significantly
Build-Up - 6 Steps
1
FoundationUnderstanding Paired Data Differences
🤔
Concept: Paired data means each item in one group matches exactly one item in the other group.
Imagine measuring blood pressure before and after a treatment for the same patients. Each patient's before and after readings form a pair. The difference between these paired values is what we analyze.
Result
You get a list of differences, one per pair, showing how much each subject changed.
Understanding paired differences is key because the test focuses on these differences, not the raw values.
2
FoundationWhy Use Non-Parametric Tests
🤔
Concept: Non-parametric tests do not assume data follows a specific shape like the normal distribution.
Many tests assume data is bell-shaped. But real data can be skewed or have outliers. Non-parametric tests like Wilcoxon work well even when data is not normal.
Result
You can analyze data safely without worrying about strict assumptions.
Knowing when to use non-parametric tests prevents wrong conclusions from invalid assumptions.
3
IntermediateCalculating Signed Ranks
🤔Before reading on: Do you think the test uses raw differences or their ranks? Commit to your answer.
Concept: The test ranks the absolute differences but keeps track of their original signs (positive or negative).
First, ignore zeros. Then, rank the absolute differences from smallest to largest. Assign each rank the sign of the original difference. Finally, sum the positive and negative signed ranks separately.
Result
You get two sums: one for positive differences and one for negative differences.
Ranking reduces the effect of extreme values, making the test robust to outliers.
4
IntermediatePerforming the Wilcoxon Test in SciPy
🤔Before reading on: Do you think the test returns a p-value or just a yes/no answer? Commit to your answer.
Concept: SciPy provides a function to perform the Wilcoxon signed-rank test and returns a test statistic and p-value.
Use scipy.stats.wilcoxon(x, y) where x and y are paired samples. The function calculates the test statistic and p-value to help decide if differences are significant.
Result
You get a test statistic number and a p-value indicating significance.
Using built-in functions avoids manual errors and speeds up analysis.
5
AdvancedHandling Zero Differences and Ties
🤔Before reading on: Should zero differences be included in ranking or ignored? Commit to your answer.
Concept: Zero differences are excluded because they provide no information about direction. Ties in absolute differences are handled by assigning average ranks.
When differences are zero, remove them before ranking. If two or more differences have the same absolute value, assign them the average of their ranks to keep fairness.
Result
The ranking process remains accurate and unbiased despite ties or zeros.
Proper handling of zeros and ties ensures the test's validity and fairness.
6
ExpertExact vs Approximate p-values and Continuity Correction
🤔Before reading on: Do you think the test always uses exact p-values? Commit to your answer.
Concept: For small samples, exact p-values are computed; for larger samples, approximations with normal distribution are used, sometimes with continuity correction.
SciPy can compute exact p-values for small datasets, which are more accurate. For larger datasets, it uses a normal approximation with an optional continuity correction to improve accuracy.
Result
You get reliable p-values appropriate for your sample size and data.
Knowing when and how p-values are calculated helps interpret results correctly and avoid misleading conclusions.
Under the Hood
The test calculates differences between paired samples, ranks their absolute values, and sums ranks separately for positive and negative differences. The test statistic is the smaller of these sums. Under the null hypothesis, the distribution of this statistic is known, allowing calculation of p-values either exactly (small samples) or approximately (large samples).
Why designed this way?
The Wilcoxon signed-rank test was created to provide a non-parametric alternative to the paired t-test, avoiding assumptions about normality. Ranking differences reduces sensitivity to outliers and skewed data, making the test robust and widely applicable.
Paired Samples
┌───────────────┐
│ Sample A: x1 │
│ Sample B: y1 │
│ Difference: d1 = y1 - x1 │
├───────────────┤
│ Sample A: x2 │
│ Sample B: y2 │
│ Difference: d2 = y2 - x2 │
├───────────────┤
│ ...           │
├───────────────┤
│ Sample A: xn │
│ Sample B: yn │
│ Difference: dn = yn - xn │
└───────────────┘

Process:
1. Remove zero differences
2. Rank |d|
3. Assign signs to ranks
4. Sum positive ranks (W+)
5. Sum negative ranks (W-)
6. Test statistic = min(W+, W-)
7. Calculate p-value from distribution
Myth Busters - 4 Common Misconceptions
Quick: Does the Wilcoxon signed-rank test compare raw values or differences? Commit to your answer.
Common Belief:People often think the test compares the original paired values directly.
Tap to reveal reality
Reality:The test actually compares the differences between pairs, focusing on their ranks and signs.
Why it matters:Misunderstanding this leads to incorrect data preparation and invalid test results.
Quick: Is the Wilcoxon test only for large samples? Commit to your answer.
Common Belief:Some believe the test requires large samples to be valid.
Tap to reveal reality
Reality:The test works well for small samples and can compute exact p-values in those cases.
Why it matters:Ignoring small sample applicability may cause missed opportunities for valid analysis.
Quick: Does the test assume data is normally distributed? Commit to your answer.
Common Belief:Many think the Wilcoxon signed-rank test assumes normality like the t-test.
Tap to reveal reality
Reality:It does not assume normality; it is a non-parametric test designed for non-normal data.
Why it matters:Using parametric tests on non-normal data can lead to wrong conclusions.
Quick: Should zero differences be included in the test? Commit to your answer.
Common Belief:Some think zero differences should be ranked and included.
Tap to reveal reality
Reality:Zero differences are excluded because they do not indicate direction or magnitude.
Why it matters:Including zeros biases the test and weakens its power.
Expert Zone
1
The test statistic's distribution changes with sample size, so exact p-values are preferred for small samples but computationally expensive for large ones.
2
Continuity correction in normal approximation improves p-value accuracy but can slightly change results, so its use depends on context.
3
Handling ties by averaging ranks preserves fairness but can slightly affect test power and interpretation.
When NOT to use
Avoid using the Wilcoxon signed-rank test when data pairs are not related or when differences are not symmetric around the median. Instead, use the Mann-Whitney U test for independent samples or sign test if only direction matters.
Production Patterns
In real-world studies, the Wilcoxon test is often used in clinical trials to compare before-and-after treatment effects when data is skewed. It is also common in A/B testing when assumptions for parametric tests fail. Automated pipelines use SciPy's implementation with checks for zero differences and ties.
Connections
Paired t-test
Alternative test for paired data assuming normality
Understanding the Wilcoxon test clarifies when to choose non-parametric methods over parametric ones like the paired t-test.
Sign test
Simpler non-parametric test focusing only on direction of differences
Knowing the Wilcoxon test helps appreciate the added power gained by considering ranks, not just signs.
Rank-based methods in economics
Uses ranking of data to reduce impact of outliers and non-normality
Seeing Wilcoxon as a rank-based method connects statistics to economic models that rely on ranks to handle irregular data.
Common Pitfalls
#1Including zero differences in ranking
Wrong approach:differences = [0, 2, -3, 0, 5] ranks = rank(abs(differences)) # includes zeros
Correct approach:differences = [2, -3, 5] # zeros removed ranks = rank(abs(differences))
Root cause:Misunderstanding that zero differences provide no information about direction or magnitude.
#2Using Wilcoxon test on independent samples
Wrong approach:scipy.stats.wilcoxon(sample1, sample2) # samples are independent
Correct approach:scipy.stats.mannwhitneyu(sample1, sample2) # for independent samples
Root cause:Confusing paired and independent sample tests.
#3Ignoring ties in absolute differences
Wrong approach:Assign ranks ignoring ties, e.g., rank 1, 2, 3 for values [1,1,2]
Correct approach:Assign average ranks for ties, e.g., ranks 1.5, 1.5, 3 for values [1,1,2]
Root cause:Not knowing how to handle tied ranks properly.
Key Takeaways
The Wilcoxon signed-rank test compares paired data by ranking the absolute differences and considering their signs.
It is a non-parametric test that does not assume normal distribution, making it useful for skewed or small datasets.
Zero differences are excluded, and ties are handled by averaging ranks to maintain fairness.
SciPy's implementation provides exact p-values for small samples and approximate p-values with continuity correction for larger samples.
Choosing the Wilcoxon test over parametric alternatives depends on data characteristics and study design.