0
0
SciPydata~5 mins

Kolmogorov-Smirnov test in SciPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Kolmogorov-Smirnov test
O(n log n)
Understanding Time Complexity

We want to understand how the time needed to run the Kolmogorov-Smirnov test changes as the size of the data grows.

How does the test's execution time increase when we have more data points?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


from scipy import stats

# Sample data arrays
sample1 = [1.2, 2.3, 3.1, 4.7, 5.5]
sample2 = [1.0, 2.5, 3.0, 4.8, 5.2]

# Perform Kolmogorov-Smirnov test
result = stats.ks_2samp(sample1, sample2)
print(result)
    

This code compares two samples to check if they come from the same distribution using the Kolmogorov-Smirnov test.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Sorting both input arrays and comparing their cumulative distributions.
  • How many times: Each array is traversed multiple times during sorting and comparison, roughly proportional to the size of the arrays.
How Execution Grows With Input

As the number of data points increases, the time to sort and compare grows faster than just adding more points.

Input Size (n)Approx. Operations
10About 30 operations
100About 700 operations
1000About 10,000 operations

Pattern observation: The operations grow a bit faster than the input size, roughly like n log n, because of sorting.

Final Time Complexity

Time Complexity: O(n log n)

This means the time to run the test grows a bit faster than the number of data points, mainly due to sorting.

Common Mistake

[X] Wrong: "The test runs in linear time because it just compares values."

[OK] Correct: The test needs to sort the data first, which takes more time than just comparing values one by one.

Interview Connect

Understanding how sorting affects time helps you explain why some tests take longer with bigger data, showing you know how algorithms work behind the scenes.

Self-Check

"What if the input arrays were already sorted? How would the time complexity change?"