How to Configure Auto Scaling in Cloud Run on GCP
To configure auto scaling in
Cloud Run, set the --min-instances and --max-instances flags during deployment or update to control the minimum and maximum number of container instances. Cloud Run automatically scales instances based on incoming request load within these limits.Syntax
Use the gcloud run deploy command with flags to control auto scaling:
--min-instances: Minimum number of container instances to keep running.--max-instances: Maximum number of container instances allowed.--concurrency: Number of requests each instance can handle simultaneously (affects scaling).
bash
gcloud run deploy SERVICE_NAME \ --image IMAGE_URL \ --min-instances MIN_INSTANCES \ --max-instances MAX_INSTANCES \ --concurrency CONCURRENCY
Example
This example deploys a Cloud Run service named hello-service with a minimum of 1 instance, a maximum of 5 instances, and concurrency set to 80 requests per instance. This setup allows Cloud Run to automatically scale between 1 and 5 instances based on traffic.
bash
gcloud run deploy hello-service \ --image gcr.io/cloudrun/hello \ --min-instances 1 \ --max-instances 5 \ --concurrency 80 \ --region us-central1 \ --platform managed
Output
Deploying service [hello-service]...
Done.
Service URL: https://hello-service-xyz.a.run.app
Common Pitfalls
- Setting
--min-instancestoo high can increase costs because instances run even with no traffic. - Setting
--max-instancestoo low can cause request delays or errors when traffic spikes. - Ignoring concurrency settings may lead to inefficient scaling; higher concurrency means fewer instances needed.
- Not specifying region or platform can cause deployment errors or unexpected defaults.
bash
Wrong: gcloud run deploy my-service --image gcr.io/my-image --min-instances 10 --max-instances 2 Right: gcloud run deploy my-service --image gcr.io/my-image --min-instances 1 --max-instances 10 --region us-central1 --platform managed
Quick Reference
| Flag | Description | Example Value |
|---|---|---|
| --min-instances | Minimum number of instances to keep warm | 1 |
| --max-instances | Maximum number of instances allowed | 10 |
| --concurrency | Requests each instance can handle at once | 80 |
| --region | Cloud Run service region | us-central1 |
| --platform | Deployment platform (managed or GKE) | managed |
Key Takeaways
Set --min-instances and --max-instances to control Cloud Run auto scaling limits.
Use concurrency to optimize how many requests each instance handles before scaling.
Avoid setting min-instances too high to prevent unnecessary costs.
Always specify region and platform to ensure correct deployment.
Cloud Run automatically scales instances based on traffic within your set limits.