Google Cloud Run allows you to configure concurrency for your container instances. What is the default concurrency setting for Cloud Run services?
Think about how Cloud Run optimizes resource use by handling multiple requests per container.
By default, Cloud Run sets concurrency to 80, meaning each container instance can handle up to 80 requests at the same time. This helps optimize resource usage and reduce cold starts.
In Google App Engine Standard environment, each instance can handle a limited number of concurrent requests. What does App Engine do when an instance reaches this limit?
Think about how App Engine scales to handle more traffic.
When an instance reaches its concurrency limit, App Engine automatically creates new instances to handle additional requests, enabling automatic scaling.
When your GKE cluster scales up to handle more pods, which practice helps maintain security effectively?
Think about how to keep pods secure even as new ones are added automatically.
Using Pod Security Policies or Pod Security Admission ensures that all pods, including new ones created during scaling, follow strict security rules, reducing risk.
You want your MIG to add or remove VM instances automatically based on CPU load. Which configuration snippet correctly sets autoscaling to target 60% average CPU utilization?
Check the correct flag name and value format for CPU utilization target.
The correct flag is --target-cpu-utilization and the value is a decimal between 0 and 1 representing the target CPU usage fraction.
Consider a web application running in the cloud. Why do architects usually prefer horizontal scaling (adding more machines) instead of vertical scaling (adding more power to one machine)?
Think about what happens if one machine fails in each scaling approach.
Horizontal scaling adds more machines, so if one fails, others keep working. It also spreads traffic, improving performance and reliability. Vertical scaling has limits and can cause downtime.