Reliability pillar principles in Azure - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the time to keep a system reliable changes as the system grows.
How does adding more parts affect the work to keep everything running well?
Analyze the time complexity of monitoring and recovering multiple Azure resources.
// Pseudocode for monitoring and recovery
for each resource in resourceGroup:
check health status
if unhealthy:
trigger recovery action
log status
wait for next check interval
This sequence checks many resources repeatedly to keep the system reliable.
We look at what happens over and over as the system runs.
- Primary operation: Checking health status of each resource and triggering recovery if needed.
- How many times: Once per resource every check interval, repeated continuously.
As the number of resources grows, the work to check and recover grows too.
| Input Size (n) | Approx. API Calls/Operations |
|---|---|
| 10 | About 10 health checks per interval |
| 100 | About 100 health checks per interval |
| 1000 | About 1000 health checks per interval |
Pattern observation: The number of operations grows directly with the number of resources.
Time Complexity: O(n)
This means the time to keep the system reliable grows in a straight line as you add more resources.
[X] Wrong: "Adding more resources won't affect monitoring time much because checks happen fast."
[OK] Correct: Each resource adds its own check, so total time adds up directly with resource count.
Understanding how monitoring scales helps you design systems that stay reliable as they grow, a key skill in cloud work.
"What if we grouped resources and checked groups instead of individual resources? How would the time complexity change?"
Practice
Reliability pillar in cloud architecture?Solution
Step 1: Understand the reliability pillar purpose
The reliability pillar focuses on keeping applications running smoothly and handling failures gracefully.Step 2: Compare options with the pillar goal
Only Ensure applications run without interruption and recover quickly from failures matches the goal of uninterrupted operation and quick recovery.Final Answer:
Ensure applications run without interruption and recover quickly from failures -> Option AQuick Check:
Reliability = uninterrupted and quick recovery [OK]
- Confusing reliability with cost savings
- Thinking reliability is about app speed or design
- Mixing reliability with security or performance pillars
Solution
Step 1: Identify service for failure recovery
Azure Availability Zones are designed to keep apps running by spreading resources across isolated locations.Step 2: Eliminate unrelated services
Blob Storage is for data, DevTest Labs for testing, Logic Apps for workflows, none focus on recovery.Final Answer:
Azure Availability Zones -> Option AQuick Check:
Recovery and availability = Availability Zones [OK]
- Choosing storage or workflow services instead of availability features
- Confusing testing environments with reliability tools
Solution
Step 1: Understand multi-zone deployment with failover
Deploying across zones with failover means if one zone fails, traffic moves to the other automatically.Step 2: Analyze options for failover behavior
Only Traffic automatically shifts to the healthy zone without downtime describes automatic traffic shift with no downtime, matching failover design.Final Answer:
Traffic automatically shifts to the healthy zone without downtime -> Option DQuick Check:
Failover = automatic traffic shift [OK]
- Assuming app stops or data is lost on zone failure
- Thinking manual user action is needed for failover
Solution
Step 1: Check backup configuration requirements
Azure Backup requires the backup vault to be linked correctly to the VM's resource group for successful backups.Step 2: Evaluate other options
Running in Availability Zone, scheduling time, or public IP do not prevent backups.Final Answer:
Backup vault is not linked to the VM resource group -> Option CQuick Check:
Backup fails if vault not linked properly [OK]
- Blaming zones or IP addresses for backup failure
- Assuming schedule time causes failure
Solution
Step 1: Identify services for automatic scaling and failover
Azure App Service supports Auto Scale to handle demand changes, and Traffic Manager directs traffic for failover.Step 2: Eliminate options lacking auto scaling or failover
Manual scaling or unrelated services do not meet both requirements.Final Answer:
Azure App Service with Auto Scale and Azure Traffic Manager -> Option BQuick Check:
Auto Scale + Traffic Manager = scaling and recovery [OK]
- Choosing manual scaling instead of auto scaling
- Confusing storage or testing services with reliability tools
