Reliability pillar principles in Azure - Time & Space Complexity
We want to understand how the time to keep a system reliable changes as the system grows.
How does adding more parts affect the work to keep everything running well?
Analyze the time complexity of monitoring and recovering multiple Azure resources.
// Pseudocode for monitoring and recovery
for each resource in resourceGroup:
check health status
if unhealthy:
trigger recovery action
log status
wait for next check interval
This sequence checks many resources repeatedly to keep the system reliable.
We look at what happens over and over as the system runs.
- Primary operation: Checking health status of each resource and triggering recovery if needed.
- How many times: Once per resource every check interval, repeated continuously.
As the number of resources grows, the work to check and recover grows too.
| Input Size (n) | Approx. API Calls/Operations |
|---|---|
| 10 | About 10 health checks per interval |
| 100 | About 100 health checks per interval |
| 1000 | About 1000 health checks per interval |
Pattern observation: The number of operations grows directly with the number of resources.
Time Complexity: O(n)
This means the time to keep the system reliable grows in a straight line as you add more resources.
[X] Wrong: "Adding more resources won't affect monitoring time much because checks happen fast."
[OK] Correct: Each resource adds its own check, so total time adds up directly with resource count.
Understanding how monitoring scales helps you design systems that stay reliable as they grow, a key skill in cloud work.
"What if we grouped resources and checked groups instead of individual resources? How would the time complexity change?"