Consider a Kafka producer configured with batch.size=16384 bytes and linger.ms=0. The producer sends messages of 1000 bytes each. How many messages will be sent in one batch if the producer is continuously sending messages?
Batch size limits the total bytes in a batch. Divide batch size by message size.
The batch size is 16384 bytes. Each message is 1000 bytes. So, the producer batches 16 messages (16 * 1000 = 16000 bytes) before sending. Since linger.ms=0, it sends immediately when batch is full.
A Kafka producer sends a batch of 100 messages, each 1KB uncompressed. The producer uses compression.type=gzip. After compression, the batch size is 50KB. What is the approximate compression ratio?
Compression ratio = uncompressed size / compressed size.
Uncompressed size = 100 messages * 1KB = 100KB. Compressed size = 50KB. Ratio = 100KB / 50KB = 2:1.
A Kafka producer has batch.size=32768 bytes and linger.ms=100. It sends messages of 5000 bytes each. If the producer sends 3 messages quickly, what happens?
Consider how linger.ms affects batching when batch is not full.
Batch size is 32768 bytes, 3 messages total 15000 bytes, less than batch size. With linger.ms=100, the producer waits up to 100ms to fill batch before sending. So it waits 100ms then sends the batch.
Which of the following is true about using compression.type=snappy in Kafka producers compared to gzip?
Think about speed vs compression trade-offs.
Snappy is designed for fast compression with lower CPU usage and lower compression ratio compared to gzip, which compresses better but slower and uses more CPU.
You want to maximize Kafka producer throughput for large messages (~1MB each) on a network with moderate latency. Which combination is best?
Consider message size, network latency, and compression speed.
For large messages (~1MB), a larger batch size (5MB) allows batching multiple messages, improving throughput. LZ4 offers fast compression and decompression, balancing CPU and speed well. Smaller batch sizes or no compression reduce throughput or increase latency.