What is batch.size in Kafka Producer and How It Works
batch.size in Kafka producer is the maximum size (in bytes) of a batch of records sent to a Kafka partition. It controls how many messages the producer will group together before sending them to the broker, improving throughput by reducing network calls.How It Works
Imagine you are mailing letters. Instead of sending each letter individually, you put several letters in one envelope to save time and postage. batch.size works similarly for Kafka producers. It sets the maximum size of a group (batch) of messages that the producer collects before sending them to the Kafka broker.
The producer waits until the batch reaches this size or a timeout occurs, then sends all messages together. This reduces the number of network trips and improves efficiency, especially when many small messages are produced.
Example
This example shows how to set batch.size in a Kafka producer configuration using Java.
import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.clients.producer.ProducerRecord; import java.util.Properties; public class KafkaBatchSizeExample { public static void main(String[] args) { Properties props = new Properties(); props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer"); props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384); // 16 KB batch size KafkaProducer<String, String> producer = new KafkaProducer<>(props); for (int i = 0; i < 10; i++) { producer.send(new ProducerRecord<>("my-topic", "key" + i, "value" + i)); } producer.close(); } }
When to Use
Use batch.size to improve producer throughput when sending many small messages. Increasing the batch size lets the producer send more data in one network call, reducing overhead.
However, setting it too large can increase latency because the producer waits longer to fill the batch. For low-latency needs, keep it smaller. For high throughput and less concern about delay, increase it.
Typical use cases include log aggregation, metrics collection, or any scenario where many small messages are produced rapidly.
Key Points
batch.sizeis the max size in bytes for a batch of messages.- It helps reduce network calls by grouping messages.
- Too large batch size can increase latency.
- Adjust based on throughput vs latency needs.
- Works together with
linger.msto control batching behavior.
Key Takeaways
batch.size controls the max bytes of messages sent together by the Kafka producer.batch.size based on your application's speed and delay requirements.linger.ms to decide when batches are sent.