SciPydata~5 mins

Descriptive statistics (describe) in SciPy - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: Descriptive statistics (describe)

O(n)

Understanding Time Complexity

We want to understand how the time needed to get descriptive statistics changes as the data size grows.

How does the work increase when we have more numbers to summarize?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


from scipy import stats
import numpy as np

data = np.random.rand(1000)
result = stats.describe(data)
print(result)

This code calculates summary statistics like mean, variance, min, max, and percentiles for a list of numbers.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Scanning through the data array to compute statistics.
How many times: Each element is visited a few times to calculate different statistics.

How Execution Grows With Input

As the number of data points increases, the time to compute statistics grows roughly in direct proportion.

Pattern observation: The work grows roughly linearly with the number of data points.

Final Time Complexity

Time Complexity: O(n)

This means the time to get descriptive statistics grows directly with the size of the data.

Common Mistake

[X] Wrong: "Calculating descriptive statistics takes the same time no matter how much data there is."

[OK] Correct: The function must look at each number to compute summaries, so more data means more work.

Interview Connect

Understanding how summary calculations scale helps you explain efficiency when working with large datasets.

Self-Check

"What if we used a streaming method that updates statistics without storing all data? How would the time complexity change?"