Bird
Raised Fist0
Kubernetesdevops~10 mins

Cluster upgrade strategies in Kubernetes - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Process Flow - Cluster upgrade strategies
Start: Current Cluster Version
Choose Upgrade Strategy
In-place
Upgrade Nodes
Verify & Test
Complete Upgrade & Cleanup
End
This flow shows the main steps in upgrading a Kubernetes cluster using different strategies: in-place, blue-green, and canary/rolling upgrades.
Execution Sample
Kubernetes
kubectl drain node1 --ignore-daemonsets
# Upgrade node (e.g., kubeadm upgrade node on node1)
kubectl uncordon node1
This sequence drains a node, upgrades it, then brings it back to the cluster.
Process Table
StepActionNode StateCluster StateResult
1Drain node1cordoned, pods evictedReady nodes reduced by 1Node1 ready for upgrade
2Upgrade node1upgradingCluster running with node1 offlineNode1 updated to new version
3Uncordon node1readyAll nodes ready, cluster upgradedNode1 back in service
4Verify clusterall nodes readyCluster version updatedUpgrade successful
5ExitN/AN/AUpgrade complete, cluster stable
💡 All nodes upgraded and cluster is stable with new version
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
node1_statereadycordonedupgradingreadyready
cluster_ready_nodes32233
cluster_versionv1.20v1.20v1.20v1.21v1.21
Key Moments - 3 Insights
Why do we drain the node before upgrading?
Draining the node safely evicts pods so workloads move to other nodes, preventing downtime during upgrade (see execution_table step 1).
What happens if we uncordon the node before upgrade finishes?
The node might be unstable or incompatible, causing errors. The upgrade must complete before uncordoning (see execution_table step 3).
How does cluster version change during upgrade?
Cluster version updates only after node upgrade completes and node rejoins ready state (see variable_tracker cluster_version changes at step 3 and 4).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the node1 state after step 2?
Aupgrading
Bready
Ccordoned
Ddrained
💡 Hint
Check the 'Node State' column for step 2 in the execution_table.
At which step does the cluster version update?
AStep 1
BStep 2
CStep 4
DStep 3
💡 Hint
Look at the 'cluster_version' variable in variable_tracker after each step.
If we skip draining the node, what likely happens?
ANode upgrades faster
BPods remain running on node during upgrade, causing downtime
CCluster version updates automatically
DNode is automatically uncordoned
💡 Hint
Refer to key_moments about why draining is important before upgrade.
Concept Snapshot
Cluster Upgrade Strategies:
- In-place: drain node, upgrade, uncordon
- Blue-Green: create new cluster, switch traffic
- Canary/Rolling: upgrade subset, monitor, rollback if needed
Always drain nodes before upgrade to avoid downtime.
Verify cluster stability after upgrade.
Full Transcript
This visual execution shows how Kubernetes cluster upgrades happen step-by-step. First, you drain a node to safely move workloads away. Then you upgrade the node software. After upgrade, you uncordon the node to bring it back into service. The cluster version updates only after nodes are upgraded and ready. This prevents downtime and keeps the cluster stable. Different strategies like in-place, blue-green, or canary upgrades exist, but all require careful node management and verification.

Practice

(1/5)
1. What is the recommended order when upgrading a Kubernetes cluster?
easy
A. Upgrade all nodes simultaneously
B. Upgrade worker nodes first, then control plane nodes
C. Upgrade control plane nodes first, then worker nodes
D. Upgrade only the worker nodes

Solution

  1. Step 1: Understand the role of control plane nodes

    Control plane nodes manage the cluster state and API server, so they must be stable first.
  2. Step 2: Upgrade worker nodes after control plane

    Worker nodes run workloads and depend on the control plane, so upgrade them after control plane nodes.
  3. Final Answer:

    Upgrade control plane nodes first, then worker nodes -> Option C
  4. Quick Check:

    Control plane first, workers second = A [OK]
Hint: Always upgrade control plane nodes before worker nodes [OK]
Common Mistakes:
  • Upgrading worker nodes before control plane
  • Upgrading all nodes at once causing downtime
  • Skipping control plane upgrade
2. Which command correctly drains a node before upgrading it?
easy
A. kubectl drain --ignore-daemonsets --delete-local-data
B. kubectl upgrade node
C. kubectl delete node
D. kubectl cordon --force

Solution

  1. Step 1: Identify the correct drain command syntax

    The command to safely evict pods is 'kubectl drain' with flags to ignore daemonsets and delete local data.
  2. Step 2: Verify other options are incorrect

    Upgrade and delete commands do not drain nodes; cordon only marks unschedulable but does not evict pods.
  3. Final Answer:

    kubectl drain <node-name> --ignore-daemonsets --delete-local-data -> Option A
  4. Quick Check:

    Drain command with correct flags = A [OK]
Hint: Use 'kubectl drain' with flags to safely evict pods [OK]
Common Mistakes:
  • Using 'kubectl cordon' instead of 'drain'
  • Deleting nodes instead of draining
  • Missing flags causing pod eviction failure
3. Given this upgrade sequence, what is the expected cluster state?
1. Drain node1
2. Upgrade node1
3. Uncordon node1
4. Repeat for node2 and node3
medium
A. Control plane nodes are upgraded last
B. Cluster remains available with minimal downtime
C. Pods are deleted permanently during upgrade
D. Cluster goes down during node upgrades

Solution

  1. Step 1: Analyze the upgrade steps

    Each node is drained to safely evict pods, upgraded, then uncordoned to resume scheduling.
  2. Step 2: Understand impact on cluster availability

    Upgrading nodes one by one with draining keeps workloads running on other nodes, minimizing downtime.
  3. Final Answer:

    Cluster remains available with minimal downtime -> Option B
  4. Quick Check:

    Draining and upgrading nodes one by one = D [OK]
Hint: Upgrade nodes one at a time with drain/un-cordon for uptime [OK]
Common Mistakes:
  • Assuming cluster goes down during upgrades
  • Not draining nodes causing pod failures
  • Upgrading all nodes simultaneously
4. You ran kubectl drain node1 but pods did not evict. What is the likely cause?
medium
A. DaemonSet pods are blocking eviction
B. Node is already uncordoned
C. Control plane node cannot be drained
D. Pods have no local storage

Solution

  1. Step 1: Understand drain behavior with DaemonSets

    By default, drain blocks if DaemonSet pods are running unless --ignore-daemonsets is used.
  2. Step 2: Check other options for correctness

    Uncordon status does not block eviction; control plane nodes can be drained; pods without local storage do not block drain.
  3. Final Answer:

    DaemonSet pods are blocking eviction -> Option A
  4. Quick Check:

    DaemonSet pods block drain without flag = C [OK]
Hint: Use --ignore-daemonsets flag to drain nodes with DaemonSet pods [OK]
Common Mistakes:
  • Not using --ignore-daemonsets flag
  • Confusing cordon with drain
  • Assuming control plane nodes cannot be drained
5. You want to upgrade a large Kubernetes cluster with minimal downtime. Which strategy is best?
hard
A. Upgrade all control plane nodes simultaneously, then all workers simultaneously
B. Skip draining and upgrade nodes in random order
C. Drain all nodes at once, upgrade, then uncordon all nodes
D. Use cloud provider upgrade tools to upgrade control plane, then drain and upgrade workers one by one

Solution

  1. Step 1: Consider cloud provider tools for control plane upgrade

    Cloud tools often automate safe control plane upgrades reducing manual errors.
  2. Step 2: Upgrade worker nodes one by one with drain/un-cordon

    This approach avoids downtime by keeping workloads running on other nodes during upgrade.
  3. Step 3: Evaluate other options for risks

    Upgrading all nodes simultaneously or skipping drain risks downtime and pod failures.
  4. Final Answer:

    Use cloud provider upgrade tools to upgrade control plane, then drain and upgrade workers one by one -> Option D
  5. Quick Check:

    Cloud tools + sequential worker upgrade = B [OK]
Hint: Use cloud tools and upgrade workers one at a time with drain [OK]
Common Mistakes:
  • Upgrading all nodes simultaneously causing downtime
  • Skipping drain causing pod disruption
  • Ignoring cloud provider upgrade features