Rolling upgrades in Elasticsearch - Time & Space Complexity
When performing rolling upgrades in Elasticsearch, we want to know how the time to complete the upgrade changes as the cluster size grows.
We ask: How does the upgrade process scale with more nodes?
Analyze the time complexity of this rolling upgrade process.
POST /_cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "none"
}
}
// Upgrade one node at a time
// After upgrade, re-enable allocation and wait for shard relocation
POST /_cluster/settings
{
"transient": {
"cluster.routing.allocation.enable": "all"
}
}
This snippet shows disabling shard allocation, upgrading nodes one by one, then re-enabling allocation to let shards move.
Look at what repeats during the upgrade.
- Primary operation: Upgrading each node sequentially.
- How many times: Once per node in the cluster.
As the number of nodes increases, the total upgrade time grows because each node is upgraded one after another.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 nodes | 10 upgrade steps |
| 100 nodes | 100 upgrade steps |
| 1000 nodes | 1000 upgrade steps |
Pattern observation: The total steps increase directly with the number of nodes.
Time Complexity: O(n)
This means the upgrade time grows linearly with the number of nodes in the cluster.
[X] Wrong: "Upgrading multiple nodes at once will always make the process faster without any risk."
[OK] Correct: Upgrading many nodes simultaneously can cause cluster instability and longer recovery times, which may slow down the overall process.
Understanding how rolling upgrades scale helps you explain real-world system maintenance and reliability, a key skill for managing distributed systems.
"What if we upgraded two nodes at the same time instead of one? How would the time complexity change?"