0
0
Azurecloud~7 mins

Reliability pillar principles in Azure - Commands & Configuration

Choose your learning style9 modes available
Introduction
Reliability means making sure your cloud services keep working well even if something goes wrong. It helps avoid downtime and keeps your users happy by handling failures smoothly.
When you want your website to stay online even if a server crashes
When you need to recover quickly from unexpected errors in your app
When you want to test how your system behaves under failure conditions
When you want to automatically fix problems without manual intervention
When you want to monitor your services to catch issues early
Commands
This command creates an alert in Azure Monitor to notify you if the CPU usage on a virtual machine goes above 80%. It helps detect problems early to keep your service reliable.
Terminal
az monitor metrics alert create --name HighCPUAlert --resource-group example-rg --scopes /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachines/example-vm --condition "avg Percentage CPU > 80" --description "Alert when CPU usage is high"
Expected OutputExpected
{"id":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/microsoft.insights/metricAlerts/HighCPUAlert","name":"HighCPUAlert","type":"microsoft.insights/metricalerts","location":"global","tags":{},"properties":{"description":"Alert when CPU usage is high","enabled":true,"scopes":["/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachines/example-vm"],"evaluationFrequency":"PT1M","windowSize":"PT5M","criteria":{"allOf":[{"criterionType":"StaticThresholdCriterion","metricName":"Percentage CPU","metricNamespace":"Microsoft.Compute/virtualMachines","operator":"GreaterThan","threshold":80,"timeAggregation":"Average"}]},"autoMitigate":true}}
--condition - Defines the metric and threshold to trigger the alert
--scopes - Specifies the resource to monitor
This command creates an availability set for virtual machines. It spreads VMs across different hardware to reduce downtime if one server fails.
Terminal
az vm availability-set create --name example-avset --resource-group example-rg --platform-fault-domain-count 2 --platform-update-domain-count 5
Expected OutputExpected
{"id":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/availabilitySets/example-avset","name":"example-avset","type":"Microsoft.Compute/availabilitySets","location":"eastus","properties":{"platformFaultDomainCount":2,"platformUpdateDomainCount":5}}
--platform-fault-domain-count - Number of fault domains to spread VMs across
--platform-update-domain-count - Number of update domains to spread VMs across
This command creates a virtual machine inside the availability set to ensure it benefits from fault and update domain protection.
Terminal
az vm create --resource-group example-rg --name example-vm1 --image UbuntuLTS --availability-set example-avset --admin-username azureuser --generate-ssh-keys
Expected OutputExpected
{ "fqdns": "", "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachines/example-vm1", "location": "eastus", "name": "example-vm1", "powerState": "VM running", "resourceGroup": "example-rg", "zones": null }
--availability-set - Assigns the VM to the availability set for reliability
This command creates a virtual machine scale set that automatically updates and keeps multiple VM instances running for high availability.
Terminal
az vmss create --resource-group example-rg --name example-vmss --image UbuntuLTS --upgrade-policy-mode automatic --admin-username azureuser --generate-ssh-keys --instance-count 2
Expected OutputExpected
{ "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachineScaleSets/example-vmss", "location": "eastus", "name": "example-vmss", "provisioningState": "Succeeded", "resourceGroup": "example-rg" }
--upgrade-policy-mode - Controls how VMs are updated automatically
--instance-count - Number of VM instances to run
This command sets up autoscaling to add or remove VM instances automatically based on demand, helping keep the service reliable under changing loads.
Terminal
az monitor autoscale create --resource-group example-rg --resource example-vmss --resource-type Microsoft.Compute/virtualMachineScaleSets --name example-autoscale --min-count 2 --max-count 5 --count 2
Expected OutputExpected
{"id":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/microsoft.insights/autoscalesettings/example-autoscale","name":"example-autoscale","type":"microsoft.insights/autoscalesettings","location":"global","properties":{"enabled":true,"targetResourceUri":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachineScaleSets/example-vmss","profiles":[{"name":"Auto created profile","capacity":{"minimum":"2","maximum":"5","default":"2"},"rules":[]}],"notifications":[]}}
--min-count - Minimum number of VM instances
--max-count - Maximum number of VM instances
Key Concept

If you remember nothing else from this pattern, remember: design your cloud resources to detect problems early and recover automatically to keep your service running smoothly.

Common Mistakes
Not setting up alerts to monitor resource health
Without alerts, you won't know when something goes wrong until users complain
Create metric alerts to get notified immediately about issues
Deploying VMs without availability sets or scale sets
A single VM failure can cause downtime if not spread across fault domains
Use availability sets or scale sets to distribute VMs for higher reliability
Not configuring autoscaling for variable workloads
Your service may become slow or crash under heavy load without enough resources
Set up autoscale rules to add or remove instances automatically
Summary
Create alerts to monitor important metrics and get notified of issues early.
Use availability sets or scale sets to spread resources and avoid single points of failure.
Configure autoscaling to adjust resources automatically based on demand.