SciPydata~10 mins

Kolmogorov-Smirnov test in SciPy - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Kolmogorov-Smirnov test

Start with two samples or sample and distribution

↓

Calculate empirical CDFs

↓

Find max difference D between CDFs

↓

Calculate p-value from D and sample sizes

↓

Compare p-value to significance level

↓

p > alpha

↓

Fail to reject H0

The test compares two distributions by measuring the largest difference between their cumulative distributions, then decides if they are similar or not.

Execution Sample

SciPy

from scipy.stats import ks_2samp

sample1 = [1, 2, 3, 4, 5]
sample2 = [2, 3, 4, 5, 6]

result = ks_2samp(sample1, sample2)
print(result)

This code compares two small samples to see if they come from the same distribution using the Kolmogorov-Smirnov test.

Execution Table

Step	Action	Value/Calculation	Result
1	Input samples	sample1=[1,2,3,4,5], sample2=[2,3,4,5,6]	Samples ready
2	Calculate empirical CDFs	CDF1 and CDF2 arrays	CDF1=[0.2,0.4,0.6,0.8,1.0], CDF2=[0.2,0.4,0.6,0.8,1.0]
3	Find max difference D	max\|CDF1 - CDF2\|	D=0.2
4	Calculate p-value	p based on D and sample sizes	p=1.0
5	Compare p-value to alpha=0.05	p=1.0 > 0.05	Fail to reject H0
6	Conclusion	Samples likely from same distribution	Test result: statistic=0.2, pvalue=1.0

💡 p-value is greater than 0.05, so we fail to reject the null hypothesis that samples come from the same distribution

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	Final
sample1	[1,2,3,4,5]	[1,2,3,4,5]	[1,2,3,4,5]	[1,2,3,4,5]	[1,2,3,4,5]
sample2	[2,3,4,5,6]	[2,3,4,5,6]	[2,3,4,5,6]	[2,3,4,5,6]	[2,3,4,5,6]
CDF1	N/A	[0.2,0.4,0.6,0.8,1.0]	[0.2,0.4,0.6,0.8,1.0]	[0.2,0.4,0.6,0.8,1.0]	[0.2,0.4,0.6,0.8,1.0]
CDF2	N/A	[0.2,0.4,0.6,0.8,1.0]	[0.2,0.4,0.6,0.8,1.0]	[0.2,0.4,0.6,0.8,1.0]	[0.2,0.4,0.6,0.8,1.0]
D	N/A	N/A	0.2	0.2	0.2
p-value	N/A	N/A	N/A	1.0	1.0

Key Moments - 3 Insights

Why do we compare the p-value to 0.05?

What does the D statistic represent?

Why do we fail to reject the null hypothesis when p is high?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table at step 3, what is the value of the D statistic?

A1.0

B0.05

C0.2

D0.8

Concept Snapshot

Kolmogorov-Smirnov test compares two samples by their cumulative distributions.
Calculate the max difference D between CDFs.
Compute p-value from D and sample sizes.
If p > 0.05, samples likely come from the same distribution.
If p <= 0.05, samples differ significantly.

Full Transcript

The Kolmogorov-Smirnov test compares two samples or a sample and a distribution by calculating their empirical cumulative distribution functions (CDFs). It finds the largest difference D between these CDFs. Using D and the sample sizes, it calculates a p-value. This p-value tells us if the difference is significant. If p is greater than 0.05, we say the samples likely come from the same distribution and fail to reject the null hypothesis. If p is less or equal to 0.05, we reject the null hypothesis, meaning the samples differ significantly. The example code uses scipy's ks_2samp function to perform this test on two small samples, showing a D of 0.2 and a p-value of 1.0, so the test concludes the samples are similar.