Which of the following is the most important principle to ensure a machine learning training pipeline is reproducible?
Think about what guarantees the exact same results when you run the pipeline multiple times.
Reproducibility requires controlling randomness and tracking versions of data and code so the pipeline can be rerun with the same inputs and produce the same outputs.
Given this snippet setting a fixed random seed in Python for training, what will be the output of print(random.randint(1, 100)) if the seed is set to 42 before?
import random random.seed(42) print(random.randint(1, 100))
Python's random.seed sets the starting point for the random number generator.
With seed 42, the first call to random.randint(1, 100) returns 82 consistently.
Which Dockerfile snippet best ensures a reproducible training environment by fixing package versions?
Fixing package versions avoids unexpected changes in dependencies.
Specifying exact package versions in Dockerfile ensures the environment is consistent across builds.
What is the correct order of these steps in a reproducible ML training pipeline?
Think about what must be done before training starts and what should be logged during the process.
First version control code, then set seeds, log data versions, then run training and save outputs.
You set fixed random seeds in your training pipeline, but results differ each run. Which is the most likely cause?
Some hardware or library operations can introduce randomness even if seeds are fixed.
GPU operations and parallelism can cause non-deterministic behavior unless explicitly controlled.