What is the main reason to set a random seed when running machine learning experiments?
Think about why you want to get the same results every time you run your code.
Setting a random seed fixes the starting point for random number generation, making experiments reproducible.
What is the output of the following Python code snippet?
import random random.seed(42) print([random.randint(1, 10) for _ in range(3)])
Run the code or recall the sequence generated by seed 42 in Python's random module.
With seed 42, the first three random integers between 1 and 10 are 7, 1, and 3.
In a multi-step machine learning pipeline involving data shuffling, model initialization, and data augmentation, which approach best ensures reproducibility?
Consider how to control randomness in each step to get consistent results.
Passing a global seed explicitly to each step's random generator ensures all randomness is controlled and reproducible.
You set torch.manual_seed(123) before training your model, but results differ between runs. What is the most likely cause?
Think about GPU operations and their randomness control.
CUDA operations have their own random seed control; forgetting to set it causes non-reproducible results.
In distributed training across multiple machines and GPUs, what is the best practice to manage random seeds to ensure reproducibility?
Consider how to balance reproducibility and independent randomness per process.
Deriving unique seeds per process from a base seed ensures reproducibility while avoiding identical random sequences across processes.