0
0
MLOpsdevops~10 mins

A/B testing model versions in MLOps - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - A/B testing model versions
Deploy Model Version A
Deploy Model Version B
Route Traffic Split
Users get A
Collect Metrics A
Compare Performance
Choose Best Model
This flow shows deploying two model versions, splitting user traffic between them, collecting performance data, and then choosing the better model.
Execution Sample
MLOps
deploy_model('v1')
deploy_model('v2')
route_traffic({'v1': 50, 'v2': 50})
collect_metrics()
compare_metrics()
choose_best_model()
This code deploys two model versions, splits traffic evenly, collects performance data, compares results, and selects the best model.
Process Table
StepActionDetailsResult
1Deploy Model Version AModel v1 deployed to productionModel v1 ready
2Deploy Model Version BModel v2 deployed to productionModel v2 ready
3Route Traffic Split50% users to v1, 50% users to v2Traffic split established
4Users get PredictionsUsers receive predictions from assigned modelPredictions served
5Collect MetricsGather accuracy and latency for v1 and v2Metrics collected: v1=0.85 acc, v2=0.88 acc
6Compare PerformanceCompare accuracy and latency of v1 vs v2v2 performs better
7Choose Best ModelSelect model with better metricsModel v2 chosen for full traffic
8EndA/B test completeTraffic routed 100% to v2
💡 A/B test ends after comparing metrics and selecting the best model version
Status Tracker
VariableStartAfter Step 3After Step 5After Step 7Final
model_v1_statusnot deployeddeployeddeployeddeployeddeployed
model_v2_statusnot deployeddeployeddeployeddeployeddeployed
traffic_splitnone50% v1 / 50% v250% v1 / 50% v2100% v2100% v2
metrics_v1nonenoneaccuracy=0.85, latency=100msaccuracy=0.85, latency=100msaccuracy=0.85, latency=100ms
metrics_v2nonenoneaccuracy=0.88, latency=110msaccuracy=0.88, latency=110msaccuracy=0.88, latency=110ms
chosen_modelnonenonenonev2v2
Key Moments - 3 Insights
Why do we split traffic between two model versions instead of switching all users at once?
Splitting traffic (see Step 3 in execution_table) lets us compare real user responses to both models safely without risking all users on a potentially worse model.
How do we decide which model is better?
We compare collected metrics like accuracy and latency (Step 6). The model with better performance metrics is chosen (Step 7).
What happens to the traffic after choosing the best model?
After selecting the best model (Step 7), all user traffic is routed to that model (Step 8), ending the A/B test.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at Step 3. What is the traffic split between model versions?
A100% to model v2
B100% to model v1
C50% to model v1 and 50% to model v2
DTraffic not split yet
💡 Hint
Check the 'Details' column in Step 3 of the execution_table.
According to variable_tracker, what is the accuracy of model v2 after Step 5?
A0.88
B0.85
CNot collected yet
D1.00
💡 Hint
Look at the 'metrics_v2' row under 'After Step 5' in variable_tracker.
If model v1 had better accuracy than v2, what would change in the execution_table at Step 7?
ATraffic split would remain 50/50
BModel v1 would be chosen for full traffic
CModel v2 would still be chosen
DTest would end without choosing a model
💡 Hint
Step 7 shows which model is chosen based on performance comparison in Step 6.
Concept Snapshot
A/B testing model versions:
- Deploy two model versions simultaneously.
- Split user traffic between them (e.g., 50/50).
- Collect performance metrics (accuracy, latency).
- Compare metrics to find better model.
- Route all traffic to best model after test.
Full Transcript
A/B testing model versions means running two versions of a machine learning model at the same time. First, both models are deployed. Then, user traffic is split evenly so some users get predictions from model version A and others from version B. While users interact, the system collects performance data like accuracy and response time for each model. After enough data is collected, the models are compared. The one with better performance is chosen, and all user traffic is routed to that model. This process helps safely find the best model without risking all users on an untested version.