How to configure auto scaling cloud run

GcpHow-ToBeginner · 3 min read

How to Configure Auto Scaling in Cloud Run on GCP

To configure auto scaling in Cloud Run, set the --min-instances and --max-instances flags during deployment or update to control the minimum and maximum number of container instances. Cloud Run automatically scales instances based on incoming request load within these limits.

📐

Syntax

Use the gcloud run deploy command with flags to control auto scaling:

--min-instances: Minimum number of container instances to keep running.
--max-instances: Maximum number of container instances allowed.
--concurrency: Number of requests each instance can handle simultaneously (affects scaling).

bash

gcloud run deploy SERVICE_NAME \
  --image IMAGE_URL \
  --min-instances MIN_INSTANCES \
  --max-instances MAX_INSTANCES \
  --concurrency CONCURRENCY

💻

Example

This example deploys a Cloud Run service named hello-service with a minimum of 1 instance, a maximum of 5 instances, and concurrency set to 80 requests per instance. This setup allows Cloud Run to automatically scale between 1 and 5 instances based on traffic.

bash

gcloud run deploy hello-service \
  --image gcr.io/cloudrun/hello \
  --min-instances 1 \
  --max-instances 5 \
  --concurrency 80 \
  --region us-central1 \
  --platform managed

Output

Deploying service [hello-service]... Done. Service URL: https://hello-service-xyz.a.run.app

⚠️

Common Pitfalls

Setting --min-instances too high can increase costs because instances run even with no traffic.
Setting --max-instances too low can cause request delays or errors when traffic spikes.
Ignoring concurrency settings may lead to inefficient scaling; higher concurrency means fewer instances needed.
Not specifying region or platform can cause deployment errors or unexpected defaults.

bash

Wrong:
gcloud run deploy my-service --image gcr.io/my-image --min-instances 10 --max-instances 2

Right:
gcloud run deploy my-service --image gcr.io/my-image --min-instances 1 --max-instances 10 --region us-central1 --platform managed

📊

Quick Reference

Flag	Description	Example Value
--min-instances	Minimum number of instances to keep warm	1
--max-instances	Maximum number of instances allowed	10
--concurrency	Requests each instance can handle at once	80
--region	Cloud Run service region	us-central1
--platform	Deployment platform (managed or GKE)	managed

✅

Key Takeaways

Set --min-instances and --max-instances to control Cloud Run auto scaling limits.

Use concurrency to optimize how many requests each instance handles before scaling.

Avoid setting min-instances too high to prevent unnecessary costs.

Always specify region and platform to ensure correct deployment.

Cloud Run automatically scales instances based on traffic within your set limits.