Overview - Availability sets for redundancy

What is it?

Availability sets are a way to group virtual machines in Azure to keep them running even if some parts fail. They spread VMs across different physical hardware to avoid a single point of failure. This helps keep applications available and reliable. It is a simple method to improve uptime without complex setups.

Why it matters

Without availability sets, if the physical server or network fails, all virtual machines on it stop working, causing downtime. This can disrupt services, lose customer trust, and cost money. Availability sets reduce this risk by ensuring some VMs stay up while others might fail, keeping the service running smoothly.

Where it fits

Before learning availability sets, you should understand what virtual machines are and basic cloud concepts like regions and data centers. After this, you can learn about more advanced redundancy options like availability zones and load balancers to build even stronger systems.

Mental Model

Core Idea

Availability sets keep virtual machines on separate physical hardware so that if one fails, others keep running.

Think of it like...

Imagine a group of friends going on a trip in different cars instead of one van. If one car breaks down, the others can still reach the destination.

┌───────────────────────────────┐
│        Availability Set        │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ VM on Server│ │ VM on Server│ │
│ │ Rack A      │ │ Rack B      │ │
│ └─────────────┘ └─────────────┘ │
│  Different Fault Domains & Update Domains │
└───────────────────────────────┘

Build-Up - 6 Steps

1

FoundationWhat is an Availability Set

Concept: Introduction to the basic idea of grouping VMs for redundancy.

An availability set is a logical grouping of virtual machines in Azure. It ensures that the VMs are spread across multiple physical servers, racks, and network switches. This way, if one hardware component fails, only some VMs are affected, not all.

Result

You understand that availability sets help avoid single points of failure by distributing VMs.

Understanding that physical hardware can fail helps explain why grouping VMs across different hardware is important.

2

FoundationFault Domains and Update Domains

3

IntermediateConfiguring Availability Sets in Azure

4

IntermediateImpact on Service Availability

5

AdvancedLimitations and Best Practices

6

ExpertInternal Azure Handling of Availability Sets

Under the Hood

Azure's infrastructure divides physical servers into fault domains representing separate racks or power/network sources. Update domains represent groups of VMs rebooted sequentially during maintenance. When you create an availability set, Azure assigns VMs to these domains automatically to spread risk. This is managed by Azure's control plane, which tracks hardware health and schedules updates to minimize impact.

Why designed this way?

This design balances simplicity and effectiveness. Early cloud failures showed that hardware and maintenance caused most downtime. By isolating VMs across fault and update domains, Azure reduces correlated failures without complex user setup. Alternatives like manual VM placement were error-prone and hard to scale. This automated domain assignment was chosen for reliability and ease of use.

┌───────────────────────────────┐
│       Azure Data Center        │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Fault Domain│ │ Fault Domain│ │
│ │     0       │ │     1       │ │
│ │ ┌───────┐   │ │   ┌───────┐ │ │
│ │ │ VM 1  │   │ │   │ VM 2  │ │ │
│ │ └───────┘   │ │   └───────┘ │ │
│ └─────────────┘ └─────────────┘ │
│ Update Domains cycle reboots to avoid downtime │
└───────────────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does an availability set protect your VMs if the entire Azure region goes down? Commit yes or no.

Common Belief:Availability sets protect VMs from any kind of failure, including entire region outages.

Tap to reveal reality

Quick: Can you add an existing VM to an availability set after creation? Commit yes or no.

Common Belief:You can add any VM to an availability set anytime after creation.

Tap to reveal reality

Quick: Do all VMs in an availability set reboot at the same time during maintenance? Commit yes or no.

Common Belief:All VMs in an availability set reboot together during updates.

Tap to reveal reality

Quick: Does having only one VM in an availability set provide redundancy? Commit yes or no.

Common Belief:A single VM in an availability set is enough for redundancy.

Tap to reveal reality

Expert Zone

1

Azure's fault domain count varies by region and can affect how many VMs you can effectively spread across hardware.

2

Update domains are not guaranteed to be rebooted in a strict order but are managed to minimize simultaneous downtime.

3

Availability sets do not protect against software or application-level failures; combining with monitoring and auto-healing is essential.

When NOT to use

Availability sets are not suitable when you need protection against entire data center failures or want geographic redundancy. In those cases, use availability zones or multi-region deployments. Also, for stateless or containerized workloads, orchestrators like Kubernetes offer better scaling and redundancy.

Production Patterns

In production, availability sets are often combined with load balancers to distribute traffic across VMs. Teams deploy at least two VMs per availability set and monitor health to replace failed instances quickly. For critical apps, availability sets are a baseline, supplemented by zones or regions for disaster recovery.

Connections

Load Balancing

Builds-on

Availability sets ensure VMs stay up during failures, while load balancers distribute user traffic to healthy VMs, together providing continuous service.

Disaster Recovery

Complementary

Availability sets handle local hardware failures, but disaster recovery plans cover large-scale outages by replicating data and services across regions.

Fault Tolerance in Mechanical Engineering

Same pattern

Both use redundancy and separation of components to prevent total system failure when one part breaks.

Common Pitfalls

#1Trying to add an existing VM to an availability set after creation.

Wrong approach:az vm availability-set add --vm-name myVM --availability-set mySet

Correct approach:Assign the availability set during VM creation: az vm create --name myVM --availability-set mySet ...

Root cause:Misunderstanding that availability sets must be assigned at VM creation time.

#2Deploying only one VM in an availability set expecting redundancy.

Wrong approach:Create one VM in an availability set and assume it is protected.

Correct approach:Deploy at least two VMs in the availability set to spread across fault domains.

Root cause:Not realizing redundancy requires multiple instances to separate failure domains.

#3Assuming availability sets protect against region-wide outages.

Wrong approach:Rely solely on availability sets for disaster recovery.

Correct approach:Use availability zones or multi-region replication for region-level fault tolerance.

Root cause:Confusing local hardware redundancy with geographic disaster recovery.

Key Takeaways

Availability sets group virtual machines to spread them across different physical hardware to reduce downtime.

They use fault domains and update domains to isolate hardware failures and maintenance reboots.

VMs must be assigned to availability sets during creation; existing VMs cannot be added later.

Availability sets protect against local hardware failures but not against entire data center or region outages.

For full resilience, availability sets are combined with load balancers, availability zones, and disaster recovery strategies.