0
0
MLOpsdevops~10 mins

Multi-region deployment in MLOps - Commands & Configuration

Choose your learning style9 modes available
Introduction
Deploying machine learning models in multiple geographic regions helps reduce delays and improve reliability for users worldwide. It solves the problem of slow responses and service interruptions caused by distance or regional failures.
When your users are spread across different continents and need fast access to ML predictions.
When you want to keep your ML service running even if one region faces an outage.
When you need to comply with data residency laws by deploying models closer to user data.
When you want to balance traffic load across regions to avoid overloading a single server.
When you want to test model performance in different environments before full rollout.
Config File - deployment_config.yaml
deployment_config.yaml
regions:
  - name: us-east-1
    endpoint: https://us-east-1.ml.example.com
  - name: eu-west-1
    endpoint: https://eu-west-1.ml.example.com
  - name: ap-southeast-1
    endpoint: https://ap-southeast-1.ml.example.com
model:
  name: my-ml-model
  version: v1.2.0
  replicas: 3
traffic_routing:
  strategy: latency_based
  fallback_region: us-east-1

regions: Lists the geographic locations where the model will be deployed with their endpoints.

model: Specifies the model name, version, and number of replicas per region for availability.

traffic_routing: Defines how user requests are directed, here based on lowest latency with a fallback region.

Commands
Deploys the ML model version 1.2.0 to the US East region with 3 replicas for availability.
Terminal
mlflow deployments create --name my-ml-model-us-east-1 --region us-east-1 --model-uri models:/my-ml-model/v1.2.0 --replicas 3
Expected OutputExpected
Deployment 'my-ml-model-us-east-1' created successfully in region us-east-1 with 3 replicas.
--name - Sets the deployment name.
--region - Specifies the target region for deployment.
--replicas - Defines how many instances to run for load balancing and fault tolerance.
Deploys the same model version to the Europe West region with 3 replicas.
Terminal
mlflow deployments create --name my-ml-model-eu-west-1 --region eu-west-1 --model-uri models:/my-ml-model/v1.2.0 --replicas 3
Expected OutputExpected
Deployment 'my-ml-model-eu-west-1' created successfully in region eu-west-1 with 3 replicas.
--name - Sets the deployment name.
--region - Specifies the target region for deployment.
--replicas - Defines how many instances to run for load balancing and fault tolerance.
Lists all active deployments to verify that the model is running in multiple regions.
Terminal
mlflow deployments list
Expected OutputExpected
NAME REGION MODEL VERSION REPLICAS my-ml-model-us-east-1 us-east-1 my-ml-model v1.2.0 3 my-ml-model-eu-west-1 eu-west-1 my-ml-model v1.2.0 3
Sends a prediction request to the US East region endpoint to test the deployed model.
Terminal
curl -X POST https://us-east-1.ml.example.com/invocations -H 'Content-Type: application/json' -d '{"data": [5.1, 3.5, 1.4, 0.2]}'
Expected OutputExpected
{"predictions": ["setosa"]}
Key Concept

If you remember nothing else from this pattern, remember: deploying your ML model in multiple regions reduces delay and improves reliability by serving users closer to them.

Common Mistakes
Deploying the model only in one region when users are global.
This causes slow responses and possible downtime for distant users.
Deploy the model in multiple regions close to your users.
Not specifying the number of replicas per region.
This can lead to single points of failure and poor load handling.
Always set replicas to at least 2 or 3 for availability.
Forgetting to test the deployed endpoints with real prediction requests.
You won't know if the deployment works until users report issues.
Send test requests to each region's endpoint after deployment.
Summary
Create deployments of your ML model in each target region with specified replicas.
Verify deployments are active using the deployment list command.
Test each regional endpoint by sending prediction requests to ensure proper operation.