Random seed management in MLOps - Time & Space Complexity
When managing random seeds in machine learning pipelines, it's important to understand how the time to set or use seeds grows as the number of operations increases.
We want to know how the cost changes when we repeat random operations with seed control.
Analyze the time complexity of the following code snippet.
import random
for i in range(n):
random.seed(i)
value = random.random()
# use value in pipeline
This code sets a new random seed and generates a random number for each iteration up to n.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Loop running from 0 to n-1.
- How many times: Exactly n times, each time setting a seed and generating one random number.
Each iteration does a fixed amount of work: setting a seed and generating one number.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 seed sets + 10 random generations |
| 100 | 100 seed sets + 100 random generations |
| 1000 | 1000 seed sets + 1000 random generations |
Pattern observation: The total work grows directly in proportion to n, doubling n doubles the work.
Time Complexity: O(n)
This means the time to run this code grows linearly as the number of iterations increases.
[X] Wrong: "Setting the random seed once at the start will make all iterations equally random and fast."
[OK] Correct: Each iteration here resets the seed, so the cost happens every time. Setting the seed once would not repeat this cost each iteration.
Understanding how repeated seed setting affects runtime helps you reason about reproducibility and performance in machine learning workflows.
"What if we set the random seed only once before the loop? How would the time complexity change?"