Bird
Raised Fist0
Kubernetesdevops~5 mins

Node troubleshooting in Kubernetes - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Sometimes a computer in your Kubernetes cluster, called a node, stops working properly. Node troubleshooting helps you find out what is wrong and fix it so your apps keep running smoothly.
When a node shows as NotReady and your apps are not running on it.
When pods scheduled on a node are stuck in Pending or CrashLoopBackOff states.
When you want to check if a node has enough resources like CPU or memory.
When you suspect network or disk problems on a node.
When you want to see detailed information about a node's status and events.
Commands
This command lists all nodes in the cluster and shows their current status so you can spot any that are NotReady or have issues.
Terminal
kubectl get nodes
Expected OutputExpected
NAME STATUS ROLES AGE VERSION worker-node1 Ready <none> 10d v1.26.1 worker-node2 NotReady <none> 10d v1.26.1
This command shows detailed information about the node named worker-node2, including conditions, resource usage, and recent events to help diagnose problems.
Terminal
kubectl describe node worker-node2
Expected OutputExpected
Name: worker-node2 Roles: <none> Labels: <none> Annotations: <none> CreationTimestamp: 2024-06-01T12:00:00Z Taints: Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- Ready False 2024-06-11T10:00:00Z 2024-06-11T09:55:00Z KubeletNotReady Kubelet stopped posting status Addresses: InternalIP: 192.168.1.12 Capacity: cpu: 4 memory: 16384Mi Allocatable: cpu: 4 memory: 16384Mi Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning KubeletNotReady 5m kubelet, worker-node2 Kubelet stopped posting node status.
This command lists all pods running on the problematic node to check if any pods are stuck or failing there.
Terminal
kubectl get pods --all-namespaces --field-selector spec.nodeName=worker-node2
Expected OutputExpected
NAMESPACE NAME READY STATUS RESTARTS AGE default my-app-1234 0/1 CrashLoopBackOff 3 10m kube-system coredns-5678 1/1 Running 0 10d
--all-namespaces - Show pods from all namespaces, not just the default.
--field-selector spec.nodeName=worker-node2 - Filter pods to only those scheduled on worker-node2.
This command fetches the logs of the failing pod to see error messages that explain why it is crashing.
Terminal
kubectl logs my-app-1234 -n default
Expected OutputExpected
Error: failed to connect to database Retrying in 5 seconds...
-n default - Specify the namespace where the pod is running.
Key Concept

If you remember nothing else from node troubleshooting, remember: check node status first, then look at pods on that node and their logs to find the root cause.

Common Mistakes
Ignoring node status and directly deleting pods on a problematic node.
The underlying node issue remains, causing pods to fail again after restart.
First check node status and fix node problems before managing pods.
Not specifying the namespace when checking pod logs.
kubectl cannot find the pod if the namespace is not specified, leading to errors.
Always use -n flag with the correct namespace when accessing pod logs.
Assuming all pods on a NotReady node are healthy without checking their status.
Pods may be stuck or crashing, which needs investigation.
List pods on the node and check their status carefully.
Summary
Use 'kubectl get nodes' to see which nodes are Ready or NotReady.
Use 'kubectl describe node' to get detailed info and events about a node.
List pods on a node with 'kubectl get pods --all-namespaces --field-selector spec.nodeName=NODE_NAME'.
Check pod logs with 'kubectl logs POD_NAME -n NAMESPACE' to find errors causing failures.

Practice

(1/5)
1. What command shows the current status of all nodes in a Kubernetes cluster?
easy
A. kubectl get nodes
B. kubectl describe pods
C. kubectl get pods
D. kubectl top pods

Solution

  1. Step 1: Understand the command purpose

    kubectl get nodes lists all nodes and their status in the cluster.
  2. Step 2: Compare with other commands

    Other commands focus on pods, not nodes, so they don't show node status.
  3. Final Answer:

    kubectl get nodes -> Option A
  4. Quick Check:

    Node status = kubectl get nodes [OK]
Hint: Use 'kubectl get nodes' to see node status quickly [OK]
Common Mistakes:
  • Confusing pods with nodes
  • Using describe instead of get for quick status
  • Trying 'kubectl top pods' for node info
2. Which command syntax correctly shows detailed information about a specific node named node-1?
easy
A. kubectl describe node node-1
B. kubectl get node node-1
C. kubectl get nodes node-1
D. kubectl describe nodes node-1

Solution

  1. Step 1: Identify correct command for detailed info

    kubectl describe node node-1 shows detailed info about the node named node-1.
  2. Step 2: Check syntax correctness

    Singular 'node' is correct here; plural 'nodes' is invalid for describing a single node. 'get' shows summary, not details.
  3. Final Answer:

    kubectl describe node node-1 -> Option A
  4. Quick Check:

    Detailed node info = kubectl describe node [OK]
Hint: Use singular 'node' with describe for a specific node [OK]
Common Mistakes:
  • Using plural 'nodes' with describe for a single node
  • Using 'get' instead of 'describe' for details
  • Omitting the node name
3. What is the expected output of the command kubectl top node?
medium
A. A list of pods with their resource requests
B. A list of nodes with CPU and memory usage metrics
C. Detailed node configuration and labels
D. A list of nodes with their IP addresses only

Solution

  1. Step 1: Understand the purpose of 'kubectl top node'

    This command shows resource usage like CPU and memory for each node.
  2. Step 2: Differentiate from other outputs

    It does not show pod info, detailed config, or just IP addresses.
  3. Final Answer:

    A list of nodes with CPU and memory usage metrics -> Option B
  4. Quick Check:

    Resource usage per node = kubectl top node [OK]
Hint: Top command shows resource usage, not config or IPs [OK]
Common Mistakes:
  • Confusing node metrics with pod metrics
  • Expecting detailed config from 'top' command
  • Thinking it shows only IP addresses
4. You run kubectl describe node node-2 and see the node is in NotReady state. What is the best first step to troubleshoot?
medium
A. Run kubectl get pods to check pod status
B. Delete the node from the cluster immediately
C. Restart all pods on the node manually
D. Check the node's events section for errors or warnings

Solution

  1. Step 1: Review node events for clues

    The events section in the describe output shows recent errors or warnings causing NotReady state.
  2. Step 2: Avoid premature actions

    Deleting node or restarting pods without info can cause disruption; checking events is safer first step.
  3. Final Answer:

    Check the node's events section for errors or warnings -> Option D
  4. Quick Check:

    Check events first when node NotReady [OK]
Hint: Look at node events to find issues first [OK]
Common Mistakes:
  • Deleting node without diagnosis
  • Restarting pods blindly
  • Checking pods instead of node events first
5. A node shows high CPU usage and pods are evicted frequently. Which combined steps help troubleshoot and fix this?
hard
A. Scale down all deployments to zero immediately
B. Delete the node and recreate it to reset CPU usage
C. Use kubectl top node to confirm CPU load, then check pod resource requests and limits
D. Run kubectl describe pod on all pods to find errors

Solution

  1. Step 1: Confirm node CPU usage

    Run kubectl top node to verify high CPU load on the node.
  2. Step 2: Check pod resource settings

    Review pods' resource requests and limits to see if they are causing CPU overload and evictions.
  3. Step 3: Adjust resources or scale pods

    Based on findings, adjust pod resource limits or scale workloads to reduce CPU pressure.
  4. Final Answer:

    Use kubectl top node to confirm CPU load, then check pod resource requests and limits -> Option C
  5. Quick Check:

    Check CPU usage and pod limits to fix evictions [OK]
Hint: Check node CPU then pod limits to fix evictions [OK]
Common Mistakes:
  • Deleting node without analysis
  • Scaling down all deployments blindly
  • Checking pods errors without resource context