Bird
Raised Fist0
MLOpsdevops~10 mins

Comparing experiment runs in MLOps - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Process Flow - Comparing experiment runs
Start: Select experiments
Fetch run data
Extract metrics and parameters
Align runs for comparison
Visualize differences
Analyze results and decide
This flow shows how experiment runs are selected, their data fetched and aligned, then compared visually to analyze differences.
Execution Sample
MLOps
runs = fetch_runs(['run1', 'run2'])
metrics = extract_metrics(runs)
compare(metrics)
visualize_comparison()
This code fetches two experiment runs, extracts their metrics, compares them, and visualizes the differences.
Process Table
StepActionInputOutputNotes
1Select experiment runs['run1', 'run2']Runs selectedUser chooses runs to compare
2Fetch run dataRuns selectedRaw data for run1 and run2Data includes params, metrics, tags
3Extract metricsRaw dataMetrics dict for each runFocus on key performance metrics
4Align runsMetrics dictsAligned metrics tableEnsures metrics correspond across runs
5Visualize comparisonAligned metricsComparison chart/tableShows differences clearly
6Analyze resultsComparison chartInsights on performanceUser decides best run
7ExitN/AComparison completeProcess ends
💡 All selected runs compared and visualized for analysis
Status Tracker
VariableStartAfter Step 2After Step 3After Step 4Final
runs[]['run1', 'run2']['run1', 'run2']['run1', 'run2']['run1', 'run2']
raw_data{}{'run1': {...}, 'run2': {...}}{'run1': {...}, 'run2': {...}}{'run1': {...}, 'run2': {...}}{'run1': {...}, 'run2': {...}}
metrics{}{}{'run1': {'accuracy': 0.9, 'loss': 0.1}, 'run2': {'accuracy': 0.85, 'loss': 0.15}}{'accuracy': [0.9, 0.85], 'loss': [0.1, 0.15]}{'accuracy': [0.9, 0.85], 'loss': [0.1, 0.15]}
comparison_chartnullnullnullnullRendered chart/table
Key Moments - 3 Insights
Why do we need to align metrics before comparing runs?
Because different runs might have different sets of metrics or order. Aligning ensures we compare the same metrics side by side, as shown in step 4 of the execution_table.
What happens if a metric is missing in one run?
The alignment step handles missing metrics by marking them as absent or null, so the comparison chart can show gaps or differences clearly, preventing misleading results.
Why is visualization important after comparison?
Visualization helps quickly spot differences and trends between runs, making it easier to analyze results and decide which run performed better, as seen in step 5.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step are the metrics aligned for comparison?
AStep 5
BStep 3
CStep 4
DStep 2
💡 Hint
Check the 'Action' column in execution_table row with 'Align runs'
According to variable_tracker, what is the value of 'metrics' after Step 3?
AEmpty dictionary {}
B{'run1': {'accuracy': 0.9, 'loss': 0.1}, 'run2': {'accuracy': 0.85, 'loss': 0.15}}
CRaw data for runs
DRendered chart/table
💡 Hint
Look at the 'metrics' row under 'After Step 3' column in variable_tracker
If a new run 'run3' is added, which step would need to be repeated to include it in the comparison?
AAll steps from 1 to 5
BSteps 1 and 2
CStep 1 only
DSteps 3 and 4
💡 Hint
Adding a run affects selection, fetching data, extracting metrics, aligning, and visualization
Concept Snapshot
Comparing experiment runs:
1. Select runs to compare
2. Fetch their data (params, metrics)
3. Extract key metrics
4. Align metrics across runs
5. Visualize differences
6. Analyze to choose best run
Full Transcript
This visual execution shows how to compare experiment runs step-by-step. First, runs are selected. Then their raw data is fetched, including parameters and metrics. Next, key metrics are extracted from each run. These metrics are aligned so that the same metrics from different runs are side by side. After alignment, a visualization like a chart or table is created to show differences clearly. Finally, the user analyzes the visualization to decide which run performed better. Variables like 'runs', 'raw_data', and 'metrics' change as the process moves forward. Key moments include understanding why alignment is needed, handling missing metrics, and the importance of visualization. The quiz tests understanding of steps and variable states.

Practice

(1/5)
1.

What is the main purpose of comparing experiment runs in MLOps?

easy
A. To identify which model performs best by reviewing their results side by side
B. To delete old experiment runs to save space
C. To create new experiment runs automatically
D. To change the code of the model during training

Solution

  1. Step 1: Understand experiment runs

    Experiment runs record model training results and metrics.
  2. Step 2: Purpose of comparing runs

    Comparing runs helps see which model version performs better by looking at their results side by side.
  3. Final Answer:

    To identify which model performs best by reviewing their results side by side -> Option A
  4. Quick Check:

    Comparing runs = find best model [OK]
Hint: Comparing runs means checking results to pick the best model [OK]
Common Mistakes:
  • Thinking comparing runs deletes data
  • Confusing comparing with creating runs
  • Believing comparing changes model code
2.

Which command syntax correctly compares two experiment runs with IDs run1 and run2 under experiment exp123?

mlflow experiments compare-runs --experiment-id exp123 --run-ids run1 run2
easy
A. mlflow compare runs --experiment exp123 --ids run1,run2
B. mlflow experiments compare-runs --experiment-id exp123 --run-ids run1 run2
C. mlflow compare-runs --experiment exp123 --run-ids run1 run2
D. mlflow experiments compare --experiment-id exp123 --runs run1 run2

Solution

  1. Step 1: Check official command format

    The correct MLflow command uses 'mlflow experiments compare-runs' with '--experiment-id' and '--run-ids' flags.
  2. Step 2: Match options to syntax

    mlflow experiments compare-runs --experiment-id exp123 --run-ids run1 run2 matches the correct syntax exactly with proper flags and parameters.
  3. Final Answer:

    mlflow experiments compare-runs --experiment-id exp123 --run-ids run1 run2 -> Option B
  4. Quick Check:

    Correct command syntax = mlflow experiments compare-runs --experiment-id exp123 --run-ids run1 run2 [OK]
Hint: Use 'mlflow experiments compare-runs' with correct flags [OK]
Common Mistakes:
  • Using wrong flags like --runs instead of --run-ids
  • Mixing command order or names
  • Separating run IDs with commas instead of spaces
3.

Given two runs with metrics:
run1: accuracy=0.85, loss=0.35
run2: accuracy=0.88, loss=0.40
Which run is better if accuracy is the main metric?

medium
A. run1 because it has higher accuracy
B. run1 because it has lower loss
C. run2 because it has higher accuracy
D. run2 because it has lower loss

Solution

  1. Step 1: Identify main metric

    The question states accuracy is the main metric to compare runs.
  2. Step 2: Compare accuracy values

    run1 accuracy = 0.85, run2 accuracy = 0.88. Higher accuracy is better.
  3. Final Answer:

    run2 because it has higher accuracy -> Option C
  4. Quick Check:

    Main metric accuracy = higher is better [OK]
Hint: Focus on main metric value to pick best run [OK]
Common Mistakes:
  • Choosing run with lower loss when accuracy is main metric
  • Confusing higher and lower metric values
  • Ignoring stated main metric
4.

What is wrong with this command to compare runs?
mlflow experiments compare-runs --experiment-id exp123 --run-ids run1,run2

medium
A. Command should be 'mlflow compare-runs' without 'experiments'
B. Experiment ID flag should be --experiment, not --experiment-id
C. Run IDs must be specified with --runs, not --run-ids
D. Run IDs should be separated by spaces, not commas

Solution

  1. Step 1: Check run IDs format

    MLflow expects run IDs separated by spaces, not commas.
  2. Step 2: Verify other flags

    --experiment-id and --run-ids are correct flags; command includes 'experiments' correctly.
  3. Final Answer:

    Run IDs should be separated by spaces, not commas -> Option D
  4. Quick Check:

    Run IDs separated by spaces [OK]
Hint: Separate run IDs with spaces, not commas [OK]
Common Mistakes:
  • Using commas between run IDs
  • Changing correct flags incorrectly
  • Removing 'experiments' from command
5.

You want to compare three runs but only focus on the f1_score metric. Which command correctly filters to show only this metric?

mlflow experiments compare-runs --experiment-id exp456 --run-ids runA runB runC --metric-keys f1_score
hard
A. mlflow experiments compare-runs --experiment-id exp456 --run-ids runA runB runC --metric-keys f1_score
B. mlflow experiments compare-runs --experiment-id exp456 --run-ids runA runB runC --metrics f1_score
C. mlflow experiments compare-runs --experiment-id exp456 --run-ids runA runB runC --filter f1_score
D. mlflow experiments compare-runs --experiment-id exp456 --run-ids runA runB runC --metric-filter f1_score

Solution

  1. Step 1: Identify correct flag for metric filtering

    The correct flag to filter metrics in MLflow compare-runs is '--metric-keys'.
  2. Step 2: Match command with options

    mlflow experiments compare-runs --experiment-id exp456 --run-ids runA runB runC --metric-keys f1_score uses '--metric-keys' correctly with the metric name 'f1_score'.
  3. Final Answer:

    mlflow experiments compare-runs --experiment-id exp456 --run-ids runA runB runC --metric-keys f1_score -> Option A
  4. Quick Check:

    Use --metric-keys to focus on specific metric [OK]
Hint: Use --metric-keys flag to show only chosen metric [OK]
Common Mistakes:
  • Using wrong flag like --metrics or --filter
  • Misspelling flag names
  • Omitting metric filter when needed