0
0
Kafkadevops~10 mins

Batch size and compression tuning in Kafka - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Batch size and compression tuning
Start Producer
Collect Messages
Check Batch Size
Compress Batch
Send Batch
Repeat
Kafka producer collects messages into batches, compresses them if batch size is reached, then sends them to the broker.
Execution Sample
Kafka
producer = KafkaProducer(
  batch_size=16384,  # 16 KB
  compression_type='gzip'
)

for msg in messages:
  producer.send(topic, msg)
This code sets a batch size of 16 KB and uses gzip compression before sending messages.
Process Table
StepMessages Collected (bytes)Batch Size Limit (bytes)CompressionActionOutput
1500016384gzipCollect messageNo send yet
21200016384gzipCollect messageNo send yet
31650016384gzipBatch size exceededCompress and send batch
4016384gzipReset batchReady for next messages
5800016384gzipCollect messageNo send yet
61600016384gzipCollect messageNo send yet
71700016384gzipBatch size exceededCompress and send batch
8016384gzipReset batchReady for next messages
9---No more messagesStop
💡 No more messages to send, producer stops collecting.
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5After Step 6After Step 7After Step 8Final
messages_collected_bytes05000120001650008000160001700000
batch_size_limit16384163841638416384163841638416384163841638416384
compression_typegzipgzipgzipgzipgzipgzipgzipgzipgzipgzip
Key Moments - 3 Insights
Why does the batch send happen only after the batch size is exceeded?
Because Kafka waits to fill the batch up to the configured batch size before sending to optimize network usage, as shown in steps 3 and 7 where the batch size exceeds 16384 bytes triggering compression and send.
What happens to the batch size counter after sending a batch?
It resets to zero to start collecting new messages, as seen in steps 4 and 8 where messages_collected_bytes goes back to 0.
Why is compression applied only when sending the batch?
Compression reduces the data size for network transfer and is applied just before sending the batch, as indicated in the 'Action' column at steps 3 and 7.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the messages_collected_bytes value just before the first batch is sent?
A12000
B16500
C5000
D0
💡 Hint
Check Step 3 in the execution_table where the batch size exceeded triggers sending.
At which step does the batch size counter reset to zero after sending?
AStep 7
BStep 3
CStep 4
DStep 5
💡 Hint
Look at the variable_tracker for messages_collected_bytes resetting after sending in the execution_table.
If the batch_size was increased to 20000 bytes, how would the action at Step 3 change?
ABatch would not be sent at Step 3 because size is below new limit
BCompression would be disabled
CBatch would be sent at Step 3 as before
DProducer would stop collecting messages
💡 Hint
Compare messages_collected_bytes with batch_size_limit in execution_table rows.
Concept Snapshot
Kafka producer batches messages before sending.
Batch size controls when to send.
Compression reduces data size before send.
Larger batch size means fewer sends but more latency.
Tune batch size and compression for performance.
Full Transcript
This visual execution shows how Kafka producer collects messages into batches. It keeps adding message sizes until the batch size limit is exceeded. When that happens, it compresses the batch using gzip and sends it to the broker. After sending, the batch size counter resets to zero to start collecting new messages. This process repeats until no more messages are left. Adjusting batch size affects when batches are sent, and compression helps reduce network load.