Kubernetesdevops~5 mins

Node troubleshooting in Kubernetes - Commands & Configuration

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Sometimes a computer in your Kubernetes cluster, called a node, stops working properly. Node troubleshooting helps you find out what is wrong and fix it so your apps keep running smoothly.

When a node shows as NotReady and your apps are not running on it.

When pods scheduled on a node are stuck in Pending or CrashLoopBackOff states.

When you want to check if a node has enough resources like CPU or memory.

When you suspect network or disk problems on a node.

When you want to see detailed information about a node's status and events.

Commands

This command lists all nodes in the cluster and shows their current status so you can spot any that are NotReady or have issues.

Terminal

kubectl get nodes

Expected OutputExpected

NAME STATUS ROLES AGE VERSION worker-node1 Ready <none> 10d v1.26.1 worker-node2 NotReady <none> 10d v1.26.1

This command shows detailed information about the node named worker-node2, including conditions, resource usage, and recent events to help diagnose problems.

Terminal

kubectl describe node worker-node2

Expected OutputExpected

Name: worker-node2 Roles: <none> Labels: <none> Annotations: <none> CreationTimestamp: 2024-06-01T12:00:00Z Taints: Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- Ready False 2024-06-11T10:00:00Z 2024-06-11T09:55:00Z KubeletNotReady Kubelet stopped posting status Addresses: InternalIP: 192.168.1.12 Capacity: cpu: 4 memory: 16384Mi Allocatable: cpu: 4 memory: 16384Mi Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning KubeletNotReady 5m kubelet, worker-node2 Kubelet stopped posting node status.

This command lists all pods running on the problematic node to check if any pods are stuck or failing there.

Terminal

kubectl get pods --all-namespaces --field-selector spec.nodeName=worker-node2

Expected OutputExpected

NAMESPACE NAME READY STATUS RESTARTS AGE default my-app-1234 0/1 CrashLoopBackOff 3 10m kube-system coredns-5678 1/1 Running 0 10d

→

--all-namespaces - Show pods from all namespaces, not just the default.

→

--field-selector spec.nodeName=worker-node2 - Filter pods to only those scheduled on worker-node2.

This command fetches the logs of the failing pod to see error messages that explain why it is crashing.

Terminal

kubectl logs my-app-1234 -n default

Expected OutputExpected

Error: failed to connect to database Retrying in 5 seconds...

→

-n default - Specify the namespace where the pod is running.

Key Concept

If you remember nothing else from node troubleshooting, remember: check node status first, then look at pods on that node and their logs to find the root cause.

Common Mistakes

Ignoring node status and directly deleting pods on a problematic node.

The underlying node issue remains, causing pods to fail again after restart.

First check node status and fix node problems before managing pods.

Not specifying the namespace when checking pod logs.

kubectl cannot find the pod if the namespace is not specified, leading to errors.

Always use -n flag with the correct namespace when accessing pod logs.

Assuming all pods on a NotReady node are healthy without checking their status.

Pods may be stuck or crashing, which needs investigation.

List pods on the node and check their status carefully.

Summary

Use 'kubectl get nodes' to see which nodes are Ready or NotReady.

Use 'kubectl describe node' to get detailed info and events about a node.

List pods on a node with 'kubectl get pods --all-namespaces --field-selector spec.nodeName=NODE_NAME'.

Check pod logs with 'kubectl logs POD_NAME -n NAMESPACE' to find errors causing failures.

Practice

(1/5)

1. What command shows the current status of all nodes in a Kubernetes cluster?

easy

A. kubectl get nodes

B. kubectl describe pods

C. kubectl get pods

D. kubectl top pods

Node troubleshooting in Kubernetes - Commands & Configuration

Start learning this pattern below

Practice

Solution

Step 1: Understand the command purpose

Step 2: Compare with other commands

Final Answer:

Quick Check:

Solution

Step 1: Identify correct command for detailed info

Step 2: Check syntax correctness

Final Answer:

Quick Check:

Solution

Step 1: Understand the purpose of 'kubectl top node'

Step 2: Differentiate from other outputs

Final Answer:

Quick Check:

Solution

Step 1: Review node events for clues

Step 2: Avoid premature actions

Final Answer:

Quick Check:

Solution

Step 1: Confirm node CPU usage

Step 2: Check pod resource settings

Step 3: Adjust resources or scale pods

Final Answer:

Quick Check: