0
0
Kafkadevops~10 mins

Exactly-once stream processing in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
When processing data streams, sometimes messages can be processed more than once due to failures or retries. Exactly-once stream processing ensures each message affects the system only once, avoiding duplicates and data errors.
When you need to update a database from a stream without creating duplicate records.
When processing financial transactions where double processing causes incorrect balances.
When aggregating metrics from event streams and accuracy is critical.
When retrying failed processing but want to avoid counting the same event multiple times.
When building real-time dashboards that must reflect precise data without duplication.
Config File - kafka-streams.properties
kafka-streams.properties
application.id=exactly-once-app
bootstrap.servers=localhost:9092
processing.guarantee=exactly_once_v2
cache.max.bytes.buffering=0
commit.interval.ms=100
acks=all

application.id: Identifies the Kafka Streams application uniquely.

bootstrap.servers: Kafka broker addresses to connect to.

processing.guarantee: Enables exactly-once processing with the latest Kafka version (v2).

cache.max.bytes.buffering: Set to 0 to disable caching for immediate processing.

commit.interval.ms: Frequency of committing processed offsets.

acks: Ensures all replicas acknowledge writes for durability.

Commands
Create an input topic with 3 partitions to distribute the stream data.
Terminal
kafka-topics --create --topic input-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Expected OutputExpected
Created topic input-topic.
--partitions - Number of partitions for parallelism
--replication-factor - Number of copies for fault tolerance
Create an output topic where processed results will be written exactly once.
Terminal
kafka-topics --create --topic output-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Expected OutputExpected
Created topic output-topic.
--partitions - Number of partitions for parallelism
--replication-factor - Number of copies for fault tolerance
Send sample messages to the input topic to simulate streaming data.
Terminal
kafka-console-producer --topic input-topic --bootstrap-server localhost:9092
Expected OutputExpected
No output (command runs silently)
--topic - Topic to send messages to
--bootstrap-server - Kafka broker address
Run the Kafka Streams application configured for exactly-once processing to consume from input-topic and produce to output-topic.
Terminal
java -jar exactly-once-streams-app.jar --config kafka-streams.properties
Expected OutputExpected
INFO KafkaStreams: Started application exactly-once-app INFO KafkaStreams: State transition from CREATED to RUNNING
Read the processed messages from the output topic to verify exactly-once processing results.
Terminal
kafka-console-consumer --topic output-topic --bootstrap-server localhost:9092 --from-beginning
Expected OutputExpected
processed-message-1 processed-message-2 processed-message-3
--from-beginning - Read all messages from the start
Key Concept

If you remember nothing else from this pattern, remember: exactly-once processing guarantees each message affects the system only once, preventing duplicates even if retries happen.

Common Mistakes
Not setting processing.guarantee to exactly_once_v2 in the configuration.
Without this setting, Kafka Streams defaults to at-least-once processing, which can cause duplicate processing.
Always set processing.guarantee=exactly_once_v2 to enable exactly-once semantics.
Using acks=1 instead of acks=all in producer configuration.
acks=1 risks data loss if the leader broker fails before replicas confirm, breaking exactly-once guarantees.
Set acks=all to ensure all replicas confirm writes for durability.
Enabling caching (cache.max.bytes.buffering > 0) during exactly-once processing.
Caching delays commits and can cause state inconsistencies affecting exactly-once guarantees.
Set cache.max.bytes.buffering=0 to disable caching for immediate processing.
Summary
Create input and output Kafka topics with proper partitions and replication.
Configure Kafka Streams with processing.guarantee=exactly_once_v2 and acks=all for durability.
Run the Kafka Streams app to process messages exactly once from input to output topic.
Use console producer and consumer to send and verify messages ensuring no duplicates.