Bird
Raised Fist0
Azurecloud~7 mins

Reliability pillar principles in Azure - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Reliability means making sure your cloud services keep working well even if something goes wrong. It helps avoid downtime and keeps your users happy by handling failures smoothly.
When you want your website to stay online even if a server crashes
When you need to recover quickly from unexpected errors in your app
When you want to test how your system behaves under failure conditions
When you want to automatically fix problems without manual intervention
When you want to monitor your services to catch issues early
Commands
This command creates an alert in Azure Monitor to notify you if the CPU usage on a virtual machine goes above 80%. It helps detect problems early to keep your service reliable.
Terminal
az monitor metrics alert create --name HighCPUAlert --resource-group example-rg --scopes /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachines/example-vm --condition "avg Percentage CPU > 80" --description "Alert when CPU usage is high"
Expected OutputExpected
{"id":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/microsoft.insights/metricAlerts/HighCPUAlert","name":"HighCPUAlert","type":"microsoft.insights/metricalerts","location":"global","tags":{},"properties":{"description":"Alert when CPU usage is high","enabled":true,"scopes":["/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachines/example-vm"],"evaluationFrequency":"PT1M","windowSize":"PT5M","criteria":{"allOf":[{"criterionType":"StaticThresholdCriterion","metricName":"Percentage CPU","metricNamespace":"Microsoft.Compute/virtualMachines","operator":"GreaterThan","threshold":80,"timeAggregation":"Average"}]},"autoMitigate":true}}
--condition - Defines the metric and threshold to trigger the alert
--scopes - Specifies the resource to monitor
This command creates an availability set for virtual machines. It spreads VMs across different hardware to reduce downtime if one server fails.
Terminal
az vm availability-set create --name example-avset --resource-group example-rg --platform-fault-domain-count 2 --platform-update-domain-count 5
Expected OutputExpected
{"id":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/availabilitySets/example-avset","name":"example-avset","type":"Microsoft.Compute/availabilitySets","location":"eastus","properties":{"platformFaultDomainCount":2,"platformUpdateDomainCount":5}}
--platform-fault-domain-count - Number of fault domains to spread VMs across
--platform-update-domain-count - Number of update domains to spread VMs across
This command creates a virtual machine inside the availability set to ensure it benefits from fault and update domain protection.
Terminal
az vm create --resource-group example-rg --name example-vm1 --image UbuntuLTS --availability-set example-avset --admin-username azureuser --generate-ssh-keys
Expected OutputExpected
{ "fqdns": "", "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachines/example-vm1", "location": "eastus", "name": "example-vm1", "powerState": "VM running", "resourceGroup": "example-rg", "zones": null }
--availability-set - Assigns the VM to the availability set for reliability
This command creates a virtual machine scale set that automatically updates and keeps multiple VM instances running for high availability.
Terminal
az vmss create --resource-group example-rg --name example-vmss --image UbuntuLTS --upgrade-policy-mode automatic --admin-username azureuser --generate-ssh-keys --instance-count 2
Expected OutputExpected
{ "id": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachineScaleSets/example-vmss", "location": "eastus", "name": "example-vmss", "provisioningState": "Succeeded", "resourceGroup": "example-rg" }
--upgrade-policy-mode - Controls how VMs are updated automatically
--instance-count - Number of VM instances to run
This command sets up autoscaling to add or remove VM instances automatically based on demand, helping keep the service reliable under changing loads.
Terminal
az monitor autoscale create --resource-group example-rg --resource example-vmss --resource-type Microsoft.Compute/virtualMachineScaleSets --name example-autoscale --min-count 2 --max-count 5 --count 2
Expected OutputExpected
{"id":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/microsoft.insights/autoscalesettings/example-autoscale","name":"example-autoscale","type":"microsoft.insights/autoscalesettings","location":"global","properties":{"enabled":true,"targetResourceUri":"/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/example-rg/providers/Microsoft.Compute/virtualMachineScaleSets/example-vmss","profiles":[{"name":"Auto created profile","capacity":{"minimum":"2","maximum":"5","default":"2"},"rules":[]}],"notifications":[]}}
--min-count - Minimum number of VM instances
--max-count - Maximum number of VM instances
Key Concept

If you remember nothing else from this pattern, remember: design your cloud resources to detect problems early and recover automatically to keep your service running smoothly.

Common Mistakes
Not setting up alerts to monitor resource health
Without alerts, you won't know when something goes wrong until users complain
Create metric alerts to get notified immediately about issues
Deploying VMs without availability sets or scale sets
A single VM failure can cause downtime if not spread across fault domains
Use availability sets or scale sets to distribute VMs for higher reliability
Not configuring autoscaling for variable workloads
Your service may become slow or crash under heavy load without enough resources
Set up autoscale rules to add or remove instances automatically
Summary
Create alerts to monitor important metrics and get notified of issues early.
Use availability sets or scale sets to spread resources and avoid single points of failure.
Configure autoscaling to adjust resources automatically based on demand.

Practice

(1/5)
1. Which of the following best describes the main goal of the Reliability pillar in cloud architecture?
easy
A. Ensure applications run without interruption and recover quickly from failures
B. Maximize the speed of application deployment
C. Reduce the cost of cloud resources
D. Improve the visual design of the application interface

Solution

  1. Step 1: Understand the reliability pillar purpose

    The reliability pillar focuses on keeping applications running smoothly and handling failures gracefully.
  2. Step 2: Compare options with the pillar goal

    Only Ensure applications run without interruption and recover quickly from failures matches the goal of uninterrupted operation and quick recovery.
  3. Final Answer:

    Ensure applications run without interruption and recover quickly from failures -> Option A
  4. Quick Check:

    Reliability = uninterrupted and quick recovery [OK]
Hint: Reliability means apps stay up and fix themselves fast [OK]
Common Mistakes:
  • Confusing reliability with cost savings
  • Thinking reliability is about app speed or design
  • Mixing reliability with security or performance pillars
2. Which Azure service is primarily used to automatically recover from failures and maintain application availability?
easy
A. Azure Availability Zones
B. Azure Blob Storage
C. Azure DevTest Labs
D. Azure Logic Apps

Solution

  1. Step 1: Identify service for failure recovery

    Azure Availability Zones are designed to keep apps running by spreading resources across isolated locations.
  2. Step 2: Eliminate unrelated services

    Blob Storage is for data, DevTest Labs for testing, Logic Apps for workflows, none focus on recovery.
  3. Final Answer:

    Azure Availability Zones -> Option A
  4. Quick Check:

    Recovery and availability = Availability Zones [OK]
Hint: Availability Zones protect apps by spreading resources [OK]
Common Mistakes:
  • Choosing storage or workflow services instead of availability features
  • Confusing testing environments with reliability tools
3. Consider this Azure setup: A web app is deployed across two Availability Zones with automatic failover configured. If one zone goes down, what happens?
medium
A. The app stops working until the zone is restored
B. Users must manually switch to a backup URL
C. The app data is lost permanently
D. Traffic automatically shifts to the healthy zone without downtime

Solution

  1. Step 1: Understand multi-zone deployment with failover

    Deploying across zones with failover means if one zone fails, traffic moves to the other automatically.
  2. Step 2: Analyze options for failover behavior

    Only Traffic automatically shifts to the healthy zone without downtime describes automatic traffic shift with no downtime, matching failover design.
  3. Final Answer:

    Traffic automatically shifts to the healthy zone without downtime -> Option D
  4. Quick Check:

    Failover = automatic traffic shift [OK]
Hint: Failover means traffic moves automatically to healthy zone [OK]
Common Mistakes:
  • Assuming app stops or data is lost on zone failure
  • Thinking manual user action is needed for failover
4. You configured Azure Backup for your virtual machines but notice backups are failing. What is the most likely cause?
medium
A. The VM has no public IP address
B. The VM is running in an Availability Zone
C. Backup vault is not linked to the VM resource group
D. Backup is scheduled during off-peak hours

Solution

  1. Step 1: Check backup configuration requirements

    Azure Backup requires the backup vault to be linked correctly to the VM's resource group for successful backups.
  2. Step 2: Evaluate other options

    Running in Availability Zone, scheduling time, or public IP do not prevent backups.
  3. Final Answer:

    Backup vault is not linked to the VM resource group -> Option C
  4. Quick Check:

    Backup fails if vault not linked properly [OK]
Hint: Backup needs vault linked to VM group [OK]
Common Mistakes:
  • Blaming zones or IP addresses for backup failure
  • Assuming schedule time causes failure
5. You want to design an Azure solution that automatically scales out when demand increases and recovers quickly from failures. Which combination of services best supports these reliability principles?
hard
A. Azure Virtual Machines with manual scaling and Azure Backup
B. Azure App Service with Auto Scale and Azure Traffic Manager
C. Azure Blob Storage with Azure Functions and Azure DevTest Labs
D. Azure Logic Apps with static IP and Azure Monitor

Solution

  1. Step 1: Identify services for automatic scaling and failover

    Azure App Service supports Auto Scale to handle demand changes, and Traffic Manager directs traffic for failover.
  2. Step 2: Eliminate options lacking auto scaling or failover

    Manual scaling or unrelated services do not meet both requirements.
  3. Final Answer:

    Azure App Service with Auto Scale and Azure Traffic Manager -> Option B
  4. Quick Check:

    Auto Scale + Traffic Manager = scaling and recovery [OK]
Hint: Auto Scale + Traffic Manager = scale and recover fast [OK]
Common Mistakes:
  • Choosing manual scaling instead of auto scaling
  • Confusing storage or testing services with reliability tools