0
0
GCPcloud~5 mins

Request-based auto scaling in GCP - Commands & Configuration

Choose your learning style9 modes available
Introduction
Sometimes your app gets more visitors and needs more computers to handle the extra work. Request-based auto scaling automatically adds or removes computers based on how many requests your app gets. This helps keep your app fast and saves money by not using too many computers when they are not needed.
When your website traffic changes a lot during the day and you want to handle more visitors without slowing down.
When you run an online store that gets busy during sales and quiet at other times.
When you have a mobile app backend that needs to quickly respond to user requests without delay.
When you want to save money by only using the computers you need at the moment.
When you want your app to stay available even if many people use it at the same time.
Config File - app.yaml
app.yaml
runtime: python39
instance_class: F2
automatic_scaling:
  target_request_count_per_instance: 10
  min_instances: 1
  max_instances: 5

This file tells Google App Engine how to run your app.

runtime sets the programming language and version.

instance_class chooses the size of the computer running your app.

automatic_scaling controls how many instances run based on requests.

target_request_count_per_instance is the number of requests each instance should handle before adding more.

min_instances and max_instances set the limits for scaling.

Commands
This command uploads your app and the scaling settings to Google Cloud so it can start running with request-based auto scaling.
Terminal
gcloud app deploy app.yaml
Expected OutputExpected
Services to deploy: descriptor: [app.yaml] source: [/home/user/myapp] target project: [my-gcp-project] target service: [default] Do you want to continue (Y/n)? y Beginning deployment... Updating service [default]... ................................................................................................ Deployed service [default] to [https://my-gcp-project.uc.r.appspot.com] You can stream logs from the command line by running: gcloud app logs tail -s default To view your application in the web browser run: gcloud app browse
This command opens your deployed app in the web browser so you can see it working and test the scaling behavior.
Terminal
gcloud app browse
Expected OutputExpected
Launching default service in browser...
This command shows the current number of instances running your app, so you can check if auto scaling is working as expected.
Terminal
gcloud app instances list
Expected OutputExpected
SERVICE VERSION INSTANCE VM_ID VM_IP STATE default 20240601 instance-1 1234567890abcdef1234567890abcdef 35.233.123.45 RUNNING
Key Concept

If you remember nothing else from this pattern, remember: request-based auto scaling adjusts the number of app instances automatically based on how many requests each instance receives to keep your app fast and efficient.

Common Mistakes
Setting min_instances and max_instances to the same number.
This disables scaling because the number of instances cannot change, so your app cannot handle more or fewer requests dynamically.
Set min_instances to a low number like 1 and max_instances to a higher number like 5 to allow scaling up and down.
Setting target_request_count_per_second too high.
Each instance will get too many requests and become slow or unresponsive, causing poor user experience.
Choose a reasonable target like 10 requests per second per instance to balance load and performance.
Not deploying the app after changing the scaling settings.
Changes in the config file won't take effect until you deploy, so the app won't scale as expected.
Always run 'gcloud app deploy app.yaml' after editing the config file.
Summary
Create an app.yaml file with automatic_scaling settings to control request-based scaling.
Deploy the app using 'gcloud app deploy app.yaml' to apply the scaling configuration.
Use 'gcloud app instances list' to check how many instances are running and verify scaling behavior.