SciPydata~5 mins

Percentiles and quantiles in SciPy - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: Percentiles and quantiles

O(n log n)

Understanding Time Complexity

We want to know how the time to calculate percentiles or quantiles changes as the data size grows.

How does the work increase when we have more numbers to analyze?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import numpy as np

data = np.random.rand(1000)
percentile_90 = np.percentile(data, 90)
quantile_25 = np.quantile(data, 0.25)

This code generates 1000 random numbers and finds the 90th percentile and 25th quantile.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Sorting the array to find the percentile or quantile.
How many times: The sorting happens once per percentile or quantile calculation.

How Execution Grows With Input

Sorting takes more time as the list gets longer, but not in a simple doubling way.

Pattern observation: The operations grow a bit faster than the input size, but not as fast as squaring it.

Final Time Complexity

Time Complexity: O(n log n)

This means the time to find percentiles grows a bit faster than the number of data points because of sorting.

Common Mistake

[X] Wrong: "Finding a percentile is just picking one number, so it should be very fast regardless of data size."

[OK] Correct: To find the correct percentile, the data usually needs to be sorted first, which takes more time as data grows.

Interview Connect

Understanding how sorting affects percentile calculations helps you explain performance in data analysis tasks clearly and confidently.

Self-Check

What if we used an approximate method that does not sort the entire data? How would the time complexity change?