Probability density and cumulative functions in SciPy - Time & Space Complexity
We want to understand how the time to calculate probability density and cumulative functions changes as we increase the number of points.
How does the work grow when we ask for more values?
Analyze the time complexity of the following code snippet.
import numpy as np
from scipy.stats import norm
x = np.linspace(-3, 3, 1000)
pdf_values = norm.pdf(x)
cdf_values = norm.cdf(x)
This code calculates the probability density function (pdf) and cumulative distribution function (cdf) for 1000 points from a normal distribution.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Computing pdf and cdf values for each point in the array.
- How many times: Once for each of the 1000 points (length of x).
Each additional point requires one more calculation for pdf and one more for cdf.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 calculations (10 pdf + 10 cdf) |
| 100 | About 200 calculations |
| 1000 | About 2000 calculations |
Pattern observation: The total work grows roughly in direct proportion to the number of points.
Time Complexity: O(n)
This means the time to compute pdf and cdf grows linearly with the number of points you evaluate.
[X] Wrong: "Calculating pdf and cdf for many points is constant time because the functions are built-in."
[OK] Correct: Even built-in functions must compute each value separately, so more points mean more work.
Understanding how function calls scale with input size helps you explain performance clearly and shows you can reason about efficiency in data science tasks.
"What if we used vectorized operations on GPU instead of CPU? How would the time complexity change?"