Bird
Raised Fist0
MLOpsdevops~10 mins

Canary releases for model updates in MLOps - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
When you update a machine learning model, you want to make sure the new version works well before giving it to all users. Canary releases let you send the new model to a small group first, so you can check it carefully and avoid big problems.
When you want to test a new model version with a small group of users before full deployment
When you want to compare performance between the old and new models in real time
When you want to reduce risk by gradually rolling out model updates
When you want to monitor new model behavior and rollback quickly if issues appear
When you want to collect feedback or metrics on a new model version without affecting everyone
Commands
This command starts serving the current stable model on port 1234 so users can send requests to it.
Terminal
mlflow models serve -m runs:/1234567890abcdef/model -p 1234
Expected OutputExpected
2024/06/01 12:00:00 INFO mlflow.models.cli: Starting MLflow model server for model at runs:/1234567890abcdef/model on port 1234 2024/06/01 12:00:00 INFO mlflow.models.cli: Listening on http://0.0.0.0:1234
-m - Specifies the model path to serve
-p - Sets the port number for the server
This command starts serving the new model version on port 1235 for canary testing with a small user group.
Terminal
mlflow models serve -m runs:/abcdef1234567890/model -p 1235
Expected OutputExpected
2024/06/01 12:05:00 INFO mlflow.models.cli: Starting MLflow model server for model at runs:/abcdef1234567890/model on port 1235 2024/06/01 12:05:00 INFO mlflow.models.cli: Listening on http://0.0.0.0:1235
-m - Specifies the new model path to serve
-p - Sets a different port number for canary server
Send a prediction request to the stable model server to check it is working.
Terminal
curl -X POST -H 'Content-Type: application/json' -d '{"data": [[5.1, 3.5, 1.4, 0.2]]}' http://localhost:1234/invocations
Expected OutputExpected
{"predictions": [0]}
Send the same prediction request to the canary model server to compare results.
Terminal
curl -X POST -H 'Content-Type: application/json' -d '{"data": [[5.1, 3.5, 1.4, 0.2]]}' http://localhost:1235/invocations
Expected OutputExpected
{"predictions": [0]}
After successful canary testing, this command promotes the new model version to production for all users.
Terminal
mlflow models transition-stage --model-name my-model --version 2 --stage Production
Expected OutputExpected
Model version '2' of 'my-model' transitioned to stage 'Production'.
--model-name - Specifies the registered model name
--version - Specifies the model version to promote
--stage - Sets the new stage for the model version
Key Concept

If you remember nothing else from this pattern, remember: deploy the new model to a small group first, monitor it closely, then promote it to all users only if it works well.

Code Example
MLOps
import mlflow
import requests

# Define URLs for stable and canary model servers
stable_url = 'http://localhost:1234/invocations'
canary_url = 'http://localhost:1235/invocations'

# Sample input data
input_data = {"data": [[5.1, 3.5, 1.4, 0.2]]}

# Send request to stable model
response_stable = requests.post(stable_url, json=input_data)
print('Stable model prediction:', response_stable.json())

# Send request to canary model
response_canary = requests.post(canary_url, json=input_data)
print('Canary model prediction:', response_canary.json())

# Example of logging a metric during canary testing
mlflow.start_run(run_name='canary_test')
mlflow.log_metric('canary_accuracy', 0.95)
mlflow.end_run()
print('Metric logged for canary test')
OutputSuccess
Common Mistakes
Deploying the new model directly to all users without testing
This can cause widespread errors or bad predictions if the new model has issues.
Use canary releases to test the new model with a small group before full rollout.
Not monitoring the canary model's performance during testing
Without monitoring, you might miss problems and deploy a faulty model.
Collect metrics and logs during canary testing to catch issues early.
Using the same port or endpoint for both stable and canary models
This causes conflicts and makes it impossible to route traffic correctly.
Serve stable and canary models on different ports or endpoints.
Summary
Start serving the stable model on one port for all users.
Serve the new model version on a different port for a small group (canary).
Send test requests to both models to compare predictions.
Monitor canary model performance and log metrics.
Promote the new model to production only after successful testing.

Practice

(1/5)
1. What is the main purpose of a canary release when updating machine learning models?
easy
A. To train the model faster using more data
B. To immediately replace the old model with the new one for all users
C. To test the new model on a small group of users before full deployment
D. To reduce the size of the model for faster inference

Solution

  1. Step 1: Understand canary release concept

    Canary releases deploy a new model to a small subset of users first to test its performance safely.
  2. Step 2: Compare options

    Only To test the new model on a small group of users before full deployment describes testing on a small group before full rollout, which is the main purpose.
  3. Final Answer:

    To test the new model on a small group of users before full deployment -> Option C
  4. Quick Check:

    Canary release = small group test [OK]
Hint: Canary means small test group before full rollout [OK]
Common Mistakes:
  • Thinking canary releases replace models immediately
  • Confusing canary with model training speed
  • Assuming canary reduces model size
2. Which of the following is the correct way to specify 10% traffic to a new model version in a deployment configuration?
easy
A. "traffic_split": {"new_model": 10, "old_model": 90}
B. "traffic_split": {"new_model": 0.1, "old_model": 0.9}
C. "traffic_split": {"new_model": "10%", "old_model": "90%"}
D. "traffic_split": {"new_model": 1, "old_model": 9}

Solution

  1. Step 1: Understand traffic split format

    Traffic splits are usually specified as fractions summing to 1.0, representing percentages as decimals.
  2. Step 2: Evaluate options

    "traffic_split": {"new_model": 0.1, "old_model": 0.9} uses decimal fractions (0.1 and 0.9) correctly. "traffic_split": {"new_model": 10, "old_model": 90} uses integers but not fractions. "traffic_split": {"new_model": "10%", "old_model": "90%"} uses strings with percent signs, which is invalid syntax. "traffic_split": {"new_model": 1, "old_model": 9} sums to 10, not 1.
  3. Final Answer:

    "traffic_split": {"new_model": 0.1, "old_model": 0.9} -> Option B
  4. Quick Check:

    Traffic split decimals sum to 1 [OK]
Hint: Use decimals summing to 1 for traffic percentages [OK]
Common Mistakes:
  • Using integers instead of decimals for traffic split
  • Including percent signs in values
  • Traffic splits not summing to 1
3. Given this simplified code snippet for routing traffic in a canary release:
def route_request(user_id):
    if user_id % 10 == 0:
        return "new_model"
    else:
        return "old_model"

print(route_request(20))
print(route_request(23))

What will be the output?
medium
A. new_model\nold_model
B. old_model\nnew_model
C. new_model\nnew_model
D. old_model\nold_model

Solution

  1. Step 1: Analyze routing logic

    The function sends users with user_id divisible by 10 to the new model, others to old model.
  2. Step 2: Evaluate given user_ids

    For user_id 20: 20 % 10 == 0, so returns "new_model". For user_id 23: 23 % 10 == 3, so returns "old_model".
  3. Final Answer:

    new_model old_model -> Option A
  4. Quick Check:

    Divisible by 10 = new_model [OK]
Hint: Check modulo condition for routing [OK]
Common Mistakes:
  • Misunderstanding modulo operator
  • Swapping outputs for user IDs
  • Assuming all users get new model
4. You deployed a canary release but noticed the new model is receiving 100% of traffic instead of 10%. Which fix will correct this issue?
medium
A. Change traffic split from {"new_model": 1, "old_model": 0} to {"new_model": 0.1, "old_model": 0.9}
B. Increase the new model traffic to 50% to balance load
C. Restart the deployment without changing traffic split
D. Remove the old model from deployment

Solution

  1. Step 1: Identify traffic split error

    Current split {"new_model": 1, "old_model": 0} sends all traffic to new model, causing 100% traffic.
  2. Step 2: Correct traffic split values

    Setting split to {"new_model": 0.1, "old_model": 0.9} correctly routes 10% traffic to new model and 90% to old model.
  3. Final Answer:

    Change traffic split from {"new_model": 1, "old_model": 0} to {"new_model": 0.1, "old_model": 0.9} -> Option A
  4. Quick Check:

    Traffic split controls user percentage [OK]
Hint: Check traffic split decimals sum to 1 [OK]
Common Mistakes:
  • Restarting without fixing traffic split
  • Increasing new model traffic without reason
  • Removing old model prematurely
5. You want to safely update a model with a canary release. The new model shows better accuracy but higher latency. What is the best approach to decide whether to proceed with full rollout?
hard
A. Deploy new model only to internal users without monitoring
B. Ignore latency since accuracy is more important; rollout immediately
C. Increase traffic to new model to 100% to gather more data quickly
D. Monitor both accuracy and latency metrics during canary; rollback if latency impact is unacceptable

Solution

  1. Step 1: Understand trade-offs in canary release

    Canary releases test new model performance including accuracy and latency to ensure overall user experience.
  2. Step 2: Choose monitoring and rollback strategy

    Monitoring both metrics allows informed decision; rollback if latency harms user experience despite accuracy gains.
  3. Final Answer:

    Monitor both accuracy and latency metrics during canary; rollback if latency impact is unacceptable -> Option D
  4. Quick Check:

    Balance metrics and rollback if needed [OK]
Hint: Watch all key metrics before full rollout [OK]
Common Mistakes:
  • Ignoring latency impact
  • Rushing full rollout without monitoring
  • Skipping rollback plans