Normal distribution in SciPy - Time & Space Complexity
We want to understand how the time it takes to work with the normal distribution changes as we use more data points.
Specifically, how does the time grow when we calculate probabilities or generate many random values?
Analyze the time complexity of the following code snippet.
from scipy.stats import norm
# Generate 1000 random values from a normal distribution
samples = norm.rvs(loc=0, scale=1, size=1000)
# Calculate the probability density function for each sample
pdf_values = norm.pdf(samples, loc=0, scale=1)
This code generates random values from a normal distribution and then calculates the probability density for each value.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Calculating the probability density function (pdf) for each sample.
- How many times: Once for each of the 1000 samples (or n samples in general).
Each additional sample requires one pdf calculation, so the total work grows directly with the number of samples.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 pdf calculations |
| 100 | 100 pdf calculations |
| 1000 | 1000 pdf calculations |
Pattern observation: The time grows in a straight line as the number of samples increases.
Time Complexity: O(n)
This means the time to compute grows directly in proportion to the number of samples.
[X] Wrong: "Calculating the pdf for many samples takes the same time as for one sample because the function is fast."
[OK] Correct: Even if one calculation is quick, doing it many times adds up, so time grows with the number of samples.
Understanding how time grows with data size helps you explain your code's efficiency clearly and confidently in real-world tasks.
"What if we calculate the cumulative distribution function (cdf) instead of the pdf for each sample? How would the time complexity change?"