Random sampling distributions in NumPy - Time & Space Complexity
We want to understand how the time needed to create random samples changes as we ask for more samples.
How does the work grow when we increase the number of random values we generate?
Analyze the time complexity of the following code snippet.
import numpy as np
# Generate n random samples from a normal distribution
n = 1000
samples = np.random.normal(loc=0, scale=1, size=n)
# Calculate the mean of the samples
mean_value = np.mean(samples)
This code generates n random numbers from a normal distribution and then finds their average.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Generating
nrandom numbers and then calculating their mean. - How many times: The random number generation and mean calculation each process all
nelements once.
As we ask for more samples, the work grows roughly in direct proportion to the number of samples.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 random draws and 10 values averaged |
| 100 | About 100 random draws and 100 values averaged |
| 1000 | About 1000 random draws and 1000 values averaged |
Pattern observation: Doubling the number of samples roughly doubles the work done.
Time Complexity: O(n)
This means the time needed grows in a straight line with the number of samples you want.
[X] Wrong: "Generating random samples is instant and does not depend on how many samples I ask for."
[OK] Correct: Each sample requires some work, so more samples mean more time. The time grows directly with the number of samples.
Understanding how sampling time grows helps you explain performance when working with data simulations or bootstrapping in real projects.
"What if we generate samples in batches of fixed size instead of all at once? How would the time complexity change?"