Kubernetesdevops~10 mins

Node troubleshooting in Kubernetes - Step-by-Step Execution

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Process Flow - Node troubleshooting

Detect Node Issue

↓

Check Node Status

↓

Review Node Events

↓

Inspect Node Logs

↓

Check Node Resources

↓

Restart Node Components

↓

Verify Node Recovery

↓

END

This flow shows the steps to find and fix problems on a Kubernetes node, starting from detecting the issue to verifying recovery.

Execution Sample

Kubernetes

kubectl get nodes
kubectl describe node <node-name>
journalctl -u kubelet
kubectl cordon <node-name>
kubectl drain <node-name>
kubectl uncordon <node-name>

Commands to check node status, see details, view logs, and manage node availability during troubleshooting.

Process Table

Step	Command	Action	Output/Result
1	kubectl get nodes	Check overall node status	List of nodes with Ready/NotReady status
2	kubectl describe node node1	View detailed node info and events	Node conditions, recent events showing errors or warnings
3	journalctl -u kubelet	Inspect kubelet logs on node	Logs showing errors or warnings related to node services
4	kubectl cordon node1	Mark node unschedulable	Node marked as unschedulable to prevent new pods
5	kubectl drain node1	Evict pods safely	Pods evicted, node prepared for maintenance
6	systemctl restart kubelet	Restart kubelet service	Kubelet restarted, errors cleared in logs
7	kubectl uncordon node1	Allow scheduling on node	Node marked schedulable again
8	kubectl get nodes	Verify node status	Node status shows Ready
9	-	End troubleshooting	Node is healthy and ready

💡 Node status is Ready, indicating successful troubleshooting and recovery

Status Tracker

Variable	Start	After Step 1	After Step 4	After Step 5	After Step 7	Final
Node Status	Unknown	NotReady or Ready	NotReady (cordoned)	NotReady (drained)	Ready (uncordoned)	Ready
Pods on Node	Running	Running	Running	Evicted	None	Running

Key Moments - 3 Insights

Why do we cordon the node before draining it?

What does 'kubectl describe node' help us find?

Why check kubelet logs during troubleshooting?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the node status immediately after step 4 (cordon)?

AReady

BDrained

CNotReady (cordoned)

DScheduling Allowed

Concept Snapshot

Node Troubleshooting in Kubernetes:
- Use 'kubectl get nodes' to check node status
- 'kubectl describe node <name>' shows detailed info and events
- Check kubelet logs with 'journalctl -u kubelet'
- Cordon node to stop scheduling: 'kubectl cordon <name>'
- Drain node to evict pods safely: 'kubectl drain <name>'
- After fixes, uncordon node: 'kubectl uncordon <name>'
- Verify node is Ready before resuming workloads

Full Transcript

Node troubleshooting in Kubernetes starts by detecting the issue, usually by checking node status with 'kubectl get nodes'. If a node is NotReady, use 'kubectl describe node' to see detailed conditions and events that might explain the problem. Next, inspect kubelet logs on the node using 'journalctl -u kubelet' to find service errors. To safely fix the node, first cordon it to prevent new pods from scheduling, then drain it to evict existing pods. After restarting or fixing node services like kubelet, uncordon the node to allow scheduling again. Finally, verify the node status is Ready to confirm recovery. This step-by-step approach helps isolate and resolve node issues effectively.

Practice

(1/5)

1. What command shows the current status of all nodes in a Kubernetes cluster?

easy

A. kubectl get nodes

B. kubectl describe pods

C. kubectl get pods

D. kubectl top pods

Node troubleshooting in Kubernetes - Step-by-Step Execution

Start learning this pattern below

Practice

Solution

Step 1: Understand the command purpose

Step 2: Compare with other commands

Final Answer:

Quick Check:

Solution

Step 1: Identify correct command for detailed info

Step 2: Check syntax correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand the purpose of 'kubectl top node'

Step 2: Differentiate from other outputs

Final Answer:

Quick Check:

Solution

Step 1: Review node events for clues

Step 2: Avoid premature actions

Final Answer:

Quick Check:

Solution

Step 1: Confirm node CPU usage

Step 2: Check pod resource settings

Step 3: Adjust resources or scale pods

Final Answer:

Quick Check: