Introduction

Reliability design principles help keep cloud services working well even when things go wrong. They guide how to build systems that recover quickly and avoid failures.

When you want your website to stay online even if a server crashes

When you need your app to handle sudden traffic spikes without breaking

When you want to automatically fix problems without manual work

When you want to keep your data safe and available during outages

When you want to test how your system behaves under failure conditions

Commands

This command creates a virtual machine that automatically restarts if it crashes, improving reliability.

Terminal

gcloud compute instances create example-instance --zone=us-central1-a --restart-on-failure

Expected OutputExpected

Created [https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-a/instances/example-instance]. NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS example-instance us-central1-a e2-medium 10.128.0.2 34.68.123.45 RUNNING

→

--restart-on-failure - Automatically restarts the VM if it crashes

→

--zone - Specifies the location of the VM

This command checks that the VM is running and ready to serve requests.

Terminal

gcloud compute instances list --filter="name=example-instance"

Expected OutputExpected

NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS example-instance us-central1-a e2-medium 10.128.0.2 34.68.123.45 RUNNING

→

--filter - Filters the list to show only the named instance

This command creates a group of identical VMs that can replace any failed ones automatically, increasing availability.

Terminal

gcloud compute instance-groups managed create example-group --base-instance-name example-instance --size 2 --template example-template --zone us-central1-a

Expected OutputExpected

Created [https://www.googleapis.com/compute/v1/projects/my-project/zones/us-central1-a/instanceGroupManagers/example-group].

→

--size - Number of VMs in the group

→

--template - Instance template to use for creating VMs

This command shows the current VMs in the managed group and their status to verify reliability setup.

Terminal

gcloud compute instance-groups managed list-instances example-group --zone us-central1-a

Expected OutputExpected

INSTANCE_NAME ZONE STATUS example-instance-abcde us-central1-a RUNNING example-instance-fghij us-central1-a RUNNING

Key Concept

If you remember nothing else from this pattern, remember: design your cloud resources to automatically recover from failures without manual intervention.

Common Mistakes

Not enabling automatic restart on virtual machines

If a VM crashes, it stays down and causes downtime.

Always use the --restart-on-failure flag when creating VMs to ensure they restart automatically.

Creating single VMs without managed instance groups

Single VMs have no backup and cause service interruption if they fail.

Use managed instance groups to maintain multiple copies and auto-replace failed VMs.

Not verifying instance status after creation

You might think your service is running when it is not ready or failed.

Use commands like 'gcloud compute instances list' to check VM status regularly.

Summary

Create virtual machines with automatic restart to recover from crashes.

Use managed instance groups to keep multiple copies of your service running.

Check the status of your instances to ensure they are healthy and available.