Descriptive statistics (describe) in SciPy - Time & Space Complexity
We want to understand how the time needed to get descriptive statistics changes as the data size grows.
How does the work increase when we have more numbers to summarize?
Analyze the time complexity of the following code snippet.
from scipy import stats
import numpy as np
data = np.random.rand(1000)
result = stats.describe(data)
print(result)
This code calculates summary statistics like mean, variance, min, max, and percentiles for a list of numbers.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning through the data array to compute statistics.
- How many times: Each element is visited a few times to calculate different statistics.
As the number of data points increases, the time to compute statistics grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 to 50 operations |
| 100 | About 100 to 500 operations |
| 1000 | About 1000 to 5000 operations |
Pattern observation: The work grows roughly linearly with the number of data points.
Time Complexity: O(n)
This means the time to get descriptive statistics grows directly with the size of the data.
[X] Wrong: "Calculating descriptive statistics takes the same time no matter how much data there is."
[OK] Correct: The function must look at each number to compute summaries, so more data means more work.
Understanding how summary calculations scale helps you explain efficiency when working with large datasets.
"What if we used a streaming method that updates statistics without storing all data? How would the time complexity change?"