0
0
GcpHow-ToBeginner · 3 min read

How to Configure Auto Scaling in Cloud Run on GCP

To configure auto scaling in Cloud Run, set the --min-instances and --max-instances flags during deployment or update to control the minimum and maximum number of container instances. Cloud Run automatically scales instances based on incoming request load within these limits.
📐

Syntax

Use the gcloud run deploy command with flags to control auto scaling:

  • --min-instances: Minimum number of container instances to keep running.
  • --max-instances: Maximum number of container instances allowed.
  • --concurrency: Number of requests each instance can handle simultaneously (affects scaling).
bash
gcloud run deploy SERVICE_NAME \
  --image IMAGE_URL \
  --min-instances MIN_INSTANCES \
  --max-instances MAX_INSTANCES \
  --concurrency CONCURRENCY
💻

Example

This example deploys a Cloud Run service named hello-service with a minimum of 1 instance, a maximum of 5 instances, and concurrency set to 80 requests per instance. This setup allows Cloud Run to automatically scale between 1 and 5 instances based on traffic.

bash
gcloud run deploy hello-service \
  --image gcr.io/cloudrun/hello \
  --min-instances 1 \
  --max-instances 5 \
  --concurrency 80 \
  --region us-central1 \
  --platform managed
Output
Deploying service [hello-service]... Done. Service URL: https://hello-service-xyz.a.run.app
⚠️

Common Pitfalls

  • Setting --min-instances too high can increase costs because instances run even with no traffic.
  • Setting --max-instances too low can cause request delays or errors when traffic spikes.
  • Ignoring concurrency settings may lead to inefficient scaling; higher concurrency means fewer instances needed.
  • Not specifying region or platform can cause deployment errors or unexpected defaults.
bash
Wrong:
gcloud run deploy my-service --image gcr.io/my-image --min-instances 10 --max-instances 2

Right:
gcloud run deploy my-service --image gcr.io/my-image --min-instances 1 --max-instances 10 --region us-central1 --platform managed
📊

Quick Reference

FlagDescriptionExample Value
--min-instancesMinimum number of instances to keep warm1
--max-instancesMaximum number of instances allowed10
--concurrencyRequests each instance can handle at once80
--regionCloud Run service regionus-central1
--platformDeployment platform (managed or GKE)managed

Key Takeaways

Set --min-instances and --max-instances to control Cloud Run auto scaling limits.
Use concurrency to optimize how many requests each instance handles before scaling.
Avoid setting min-instances too high to prevent unnecessary costs.
Always specify region and platform to ensure correct deployment.
Cloud Run automatically scales instances based on traffic within your set limits.