0
0
Kafkadevops~7 mins

Disk I/O optimization in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
Disk input/output speed affects how fast Kafka reads and writes messages. Optimizing disk I/O helps Kafka handle more data smoothly and reduces delays.
When Kafka brokers experience slow message processing due to disk bottlenecks.
When you want to improve Kafka throughput for high-volume data streams.
When monitoring shows high disk wait times affecting Kafka performance.
When setting up Kafka on servers with multiple disks and you want to balance load.
When tuning Kafka for better durability and faster recovery after crashes.
Config File - server.properties
server.properties
log.dirs=/var/lib/kafka-logs-1,/var/lib/kafka-logs-2
num.io.threads=8
num.replica.fetchers=4
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.segment.bytes=1073741824
log.retention.hours=168
log.retention.bytes=10737418240

log.dirs: Lists multiple directories on different disks to spread Kafka logs and reduce disk I/O bottlenecks.

num.io.threads: Number of threads Kafka uses for disk I/O operations, increasing parallelism.

num.replica.fetchers: Threads fetching data from leader replicas, affecting disk read load.

log.flush.interval.messages and log.flush.interval.ms: Control how often Kafka flushes data to disk, balancing durability and performance.

log.segment.bytes: Size of log segments to optimize disk writes.

log.retention.hours and log.retention.bytes: Manage how long and how much data Kafka keeps on disk to avoid disk full issues.

Commands
Starts the Kafka broker using the optimized disk I/O settings from the configuration file.
Terminal
kafka-server-start.sh /opt/kafka/config/server.properties
Expected OutputExpected
[2024-06-01 12:00:00,000] INFO Kafka version : 3.5.0 (org.apache.kafka.common.utils.AppInfoParser) [2024-06-01 12:00:00,001] INFO Kafka startTimeMs : 1685611200000 (org.apache.kafka.common.utils.AppInfoParser) [2024-06-01 12:00:05,000] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
Checks disk I/O statistics every 2 seconds, 3 times, to monitor disk usage and identify bottlenecks.
Terminal
iostat -dx 2 3
Expected OutputExpected
Linux 5.15.0-60-generic (kafka-server) 06/01/2024 _x86_64_ (4 CPU) Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s r_await w_await svctm %util sda 50.00 30.00 1024.00 512.00 0.00 0.00 1.00 2.00 0.50 4.00 sdb 70.00 40.00 2048.00 1024.00 0.00 0.00 1.50 1.80 0.40 5.00
Shows details about the topic partitions and replicas to verify data distribution and replication which affect disk I/O.
Terminal
kafka-topics.sh --describe --topic example-topic --bootstrap-server localhost:9092
Expected OutputExpected
Topic: example-topic PartitionCount: 3 ReplicationFactor: 2 Configs: Topic: example-topic Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: example-topic Partition: 1 Leader: 2 Replicas: 2,1 Isr: 2,1 Topic: example-topic Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
--describe - Shows detailed information about the topic
--bootstrap-server - Specifies the Kafka server to connect to
Key Concept

If you remember nothing else from disk I/O optimization, remember: spreading Kafka logs across multiple disks and tuning I/O threads improves performance and reduces bottlenecks.

Common Mistakes
Setting log.dirs to only one directory on a single disk.
This causes all disk I/O to hit one disk, creating a bottleneck and slowing Kafka.
Use multiple directories on separate disks in log.dirs to distribute load.
Using too few I/O threads with num.io.threads.
Kafka cannot perform enough parallel disk operations, limiting throughput.
Increase num.io.threads to match the number of disks and CPU cores for better parallelism.
Setting log.flush.interval.ms or log.flush.interval.messages too low.
Causes excessive disk flushes, increasing disk I/O and reducing performance.
Set flush intervals to reasonable values balancing durability and performance.
Summary
Configure multiple log directories on separate disks to spread disk I/O load.
Adjust num.io.threads to increase parallel disk operations.
Monitor disk usage with iostat to identify bottlenecks and tune settings accordingly.