Kolmogorov-Smirnov test in SciPy - Time & Space Complexity
We want to understand how the time needed to run the Kolmogorov-Smirnov test changes as the size of the data grows.
How does the test's execution time increase when we have more data points?
Analyze the time complexity of the following code snippet.
from scipy import stats
# Sample data arrays
sample1 = [1.2, 2.3, 3.1, 4.7, 5.5]
sample2 = [1.0, 2.5, 3.0, 4.8, 5.2]
# Perform Kolmogorov-Smirnov test
result = stats.ks_2samp(sample1, sample2)
print(result)
This code compares two samples to check if they come from the same distribution using the Kolmogorov-Smirnov test.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Sorting both input arrays and comparing their cumulative distributions.
- How many times: Each array is traversed multiple times during sorting and comparison, roughly proportional to the size of the arrays.
As the number of data points increases, the time to sort and compare grows faster than just adding more points.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 operations |
| 100 | About 700 operations |
| 1000 | About 10,000 operations |
Pattern observation: The operations grow a bit faster than the input size, roughly like n log n, because of sorting.
Time Complexity: O(n log n)
This means the time to run the test grows a bit faster than the number of data points, mainly due to sorting.
[X] Wrong: "The test runs in linear time because it just compares values."
[OK] Correct: The test needs to sort the data first, which takes more time than just comparing values one by one.
Understanding how sorting affects time helps you explain why some tests take longer with bigger data, showing you know how algorithms work behind the scenes.
"What if the input arrays were already sorted? How would the time complexity change?"