0
0
Kafkadevops~5 mins

Why monitoring prevents outages in Kafka - Why It Works

Choose your learning style9 modes available
Introduction
Systems can stop working suddenly, causing problems for users. Monitoring helps catch issues early by watching system health and performance, so you can fix problems before they cause outages.
When you want to know if your Kafka brokers are running smoothly without delays
When you need to detect if message processing is slowing down before it causes failures
When you want alerts if disk space or memory on Kafka servers is running low
When you want to track consumer lag to avoid data loss or delays
When you want to keep an eye on network or CPU usage to prevent crashes
Config File - kafka-monitoring.properties
kafka-monitoring.properties
group.id=kafka-monitoring-group
bootstrap.servers=localhost:9092
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
enable.auto.commit=false

This configuration file sets up a Kafka consumer group for monitoring purposes. It connects to the Kafka server at localhost:9092, uses string deserializers to read messages, and disables automatic commit to control message processing manually.

Commands
This command shows details about the Kafka topic 'my-topic', including partition count and leader status, helping you check if the topic is healthy.
Terminal
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-topic
Expected OutputExpected
Topic: my-topic PartitionCount: 3 ReplicationFactor: 2 Configs: Topic: my-topic Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: my-topic Partition: 1 Leader: 2 Replicas: 2,1 Isr: 2,1 Topic: my-topic Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
--bootstrap-server - Specifies the Kafka server to connect to
--describe - Shows detailed information about the topic
--topic - Specifies which topic to describe
This command shows the status of the consumer group 'kafka-monitoring-group', including lag, which helps detect if consumers are falling behind.
Terminal
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group kafka-monitoring-group
Expected OutputExpected
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID kafka-monitoring-group my-topic 0 100 105 5 consumer-1-1234abcd-5678-efgh-ijkl-9012mnop3456 /127.0.0.1 consumer-1 kafka-monitoring-group my-topic 1 200 200 0 consumer-2-1234abcd-5678-efgh-ijkl-9012mnop3456 /127.0.0.1 consumer-2 kafka-monitoring-group my-topic 2 150 155 5 consumer-3-1234abcd-5678-efgh-ijkl-9012mnop3456 /127.0.0.1 consumer-3
--bootstrap-server - Specifies the Kafka server to connect to
--describe - Shows detailed information about the consumer group
--group - Specifies which consumer group to check
This command checks the Java garbage collection histogram for the Kafka process to monitor memory usage and detect potential memory leaks.
Terminal
jcmd $(pgrep -f kafka.Kafka) GC.class_histogram
Expected OutputExpected
num #instances #bytes class name ---------------------------------------------- 1: 100000 16000000 [C 2: 50000 12000000 java.lang.String 3: 20000 8000000 org.apache.kafka.common.network.Selector 4: 15000 6000000 java.util.HashMap$Node 5: 10000 4000000 java.nio.HeapByteBuffer
Key Concept

If you remember nothing else from this pattern, remember: monitoring lets you see problems early so you can fix them before users notice outages.

Common Mistakes
Ignoring consumer lag metrics
Lag means consumers are behind, which can cause delays or data loss if not addressed.
Regularly check consumer group lag and investigate if lag grows unexpectedly.
Not monitoring Kafka broker resource usage
High CPU, memory, or disk usage can cause Kafka to slow down or crash without warning.
Use system monitoring tools alongside Kafka metrics to track resource usage continuously.
Running monitoring commands without specifying the correct bootstrap server
Commands will fail or show no data if they connect to the wrong Kafka server.
Always use the correct --bootstrap-server flag with the Kafka server address.
Summary
Use kafka-topics.sh to check topic health and partition status.
Use kafka-consumer-groups.sh to monitor consumer lag and group status.
Use system and JVM commands to monitor resource usage and memory health.
Monitoring helps detect issues early to prevent outages.