0
0
MLOpsdevops~5 mins

Champion-challenger model comparison in MLOps - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you have a machine learning model in production, you want to test if a new model is better before switching. Champion-challenger comparison helps you run the current model (champion) and a new model (challenger) side by side to compare their results safely.
When you want to test a new model's performance without stopping the current one.
When you want to compare two models on the same input data to see which predicts better.
When you want to reduce risk by validating a new model before full deployment.
When you want to collect metrics from both models to decide which to keep.
When you want to automate model selection in a machine learning pipeline.
Commands
Run the champion model to generate predictions and log metrics for baseline comparison.
Terminal
mlflow run . -P model_type=champion
Expected OutputExpected
2024/06/01 12:00:00 INFO mlflow.projects: === Running command 'python train.py --model_type champion' === Training champion model... Metrics logged: accuracy=0.85 Run completed successfully
-P - Passes parameters to the MLflow project
Run the challenger model to generate predictions and log metrics for comparison with the champion.
Terminal
mlflow run . -P model_type=challenger
Expected OutputExpected
2024/06/01 12:05:00 INFO mlflow.projects: === Running command 'python train.py --model_type challenger' === Training challenger model... Metrics logged: accuracy=0.88 Run completed successfully
-P - Passes parameters to the MLflow project
Compare the logged metrics of champion and challenger runs to decide which model performs better.
Terminal
mlflow experiments compare-runs --experiment-id 1 --runs champion_run_id challenger_run_id
Expected OutputExpected
Comparing runs: Run champion_run_id: accuracy=0.85 Run challenger_run_id: accuracy=0.88 Challenger model performs better by 0.03 accuracy
--experiment-id - Specifies the experiment containing the runs
--runs - Specifies the run IDs to compare
Key Concept

If you remember nothing else from this pattern, remember: run both models side by side on the same data and compare their logged metrics before switching.

Code Example
MLOps
import mlflow
import random

def train_model(model_type):
    # Simulate training and metric logging
    if model_type == 'champion':
        accuracy = 0.85
    else:
        accuracy = 0.88
    with mlflow.start_run(run_name=model_type):
        mlflow.log_param('model_type', model_type)
        mlflow.log_metric('accuracy', accuracy)
    print(f"Model {model_type} trained with accuracy {accuracy}")

if __name__ == '__main__':
    import sys
    model_type = sys.argv[1] if len(sys.argv) > 1 else 'champion'
    train_model(model_type)
OutputSuccess
Common Mistakes
Running only the challenger model without running the champion for comparison.
You won't have a baseline to compare against, so you can't tell if the new model is better.
Always run and log metrics for both champion and challenger models on the same data.
Comparing models using different datasets or metrics.
This leads to unfair comparisons and wrong conclusions about model performance.
Use the same dataset and evaluation metrics for both models during comparison.
Switching to the challenger model without analyzing the comparison results.
You might deploy a worse model, causing performance drops in production.
Review the comparison metrics carefully and only promote the challenger if it clearly outperforms the champion.
Summary
Run the champion model and log its performance metrics.
Run the challenger model on the same data and log its metrics.
Compare the metrics of both models to decide which one to promote.