Why cluster monitoring matters in Kubernetes - Performance Analysis
Start learning this pattern below
Jump into concepts and practice - no test required
Monitoring a Kubernetes cluster helps us see how the system behaves as it grows.
We want to know how the cost of monitoring changes when the cluster size increases.
Analyze the time complexity of the following monitoring setup.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-monitor
spec:
selector:
matchLabels:
app: example-app
endpoints:
- port: web
interval: 30s
This code defines a ServiceMonitor that collects metrics from all pods labeled 'example-app' every 30 seconds.
- Primary operation: Scraping metrics from each pod matching the label.
- How many times: Once per pod, repeated every 30 seconds.
As the number of pods increases, the monitoring system must scrape more endpoints.
| Input Size (n) | Approx. Operations per Interval |
|---|---|
| 10 pods | 10 scrapes |
| 100 pods | 100 scrapes |
| 1000 pods | 1000 scrapes |
Pattern observation: The number of scraping operations grows directly with the number of pods.
Time Complexity: O(n)
This means the monitoring work grows linearly as the cluster size grows.
[X] Wrong: "Monitoring cost stays the same no matter how many pods exist."
[OK] Correct: Each pod adds more endpoints to scrape, so more work is needed as pods increase.
Understanding how monitoring scales helps you design systems that stay reliable as they grow.
"What if the monitoring interval changes from 30 seconds to 10 seconds? How would the time complexity change?"
Practice
Solution
Step 1: Understand the purpose of monitoring
Monitoring tracks system health and performance to spot issues early.Step 2: Compare options with monitoring goals
Only early problem detection and health maintenance match monitoring's purpose.Final Answer:
It helps detect problems early and keeps the system healthy. -> Option BQuick Check:
Monitoring = Early problem detection [OK]
- Confusing monitoring with automatic scaling
- Thinking monitoring replaces backups
- Assuming monitoring deletes containers
Solution
Step 1: Identify command to list nodes
The commandkubectl get nodeslists all cluster nodes and their status.Step 2: Eliminate other commands
kubectl get podslists pods, not nodes;kubectl describe serviceshows service details;kubectl logsshows logs of pods.Final Answer:
kubectl get nodes -> Option AQuick Check:
Nodes status = kubectl get nodes [OK]
- Using 'kubectl get pods' to check nodes
- Confusing logs with node status
- Describing services instead of nodes
kubectl top nodes, what does it indicate?
NAME CPU(cores) MEMORY(bytes) node-1 250m 512Mi node-2 900m 1Gi node-3 100m 256Mi
Solution
Step 1: Analyze CPU and memory usage per node
node-2 shows 900m CPU and 1Gi memory, which is higher than node-1 and node-3.Step 2: Compare usage values
node-3 has lowest CPU (100m), node-1 has moderate CPU (250m), node-2 is highest in both CPU and memory.Final Answer:
node-2 is under heavy CPU and memory load compared to others. -> Option DQuick Check:
Highest CPU and memory = node-2 [OK]
- Mistaking 100m as highest CPU
- Assuming equal resource usage
- Confusing memory units
kubectl top nodes. What is the most likely cause?Solution
Step 1: Understand what provides metrics for 'kubectl top'
The metrics-server collects resource usage data for nodes and pods.Step 2: Identify why metrics might be missing
If metrics-server is missing or not running,kubectl topshows no data.Final Answer:
Metrics-server is not installed or running. -> Option CQuick Check:
Missing metrics = metrics-server issue [OK]
- Blaming kubectl version without checking metrics-server
- Assuming nodes are offline without verification
- Thinking pod labels affect node metrics
Solution
Step 1: Identify monitoring tool for alerts
Prometheus collects metrics and supports alerting rules for conditions like high CPU.Step 2: Evaluate options for reliability
Manual checks are slow and error-prone; restarting nodes blindly is not a solution; disabling monitoring removes visibility.Final Answer:
Use Prometheus to monitor node metrics and configure alert rules for CPU thresholds. -> Option AQuick Check:
Automated alerts = Prometheus + alert rules [OK]
- Relying on manual checks only
- Restarting nodes without cause
- Disabling monitoring to avoid alerts
