0
0
Kafkadevops~10 mins

Exactly-once stream processing in Kafka - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Exactly-once stream processing
Start Stream Processing
Read Input Record
Process Record
Begin Transaction
Write Output Record
Acknowledge Input
Commit Transaction
Next Record or End
The flow reads each record, processes it, writes output within a transaction, acknowledges input by sending offsets to the transaction, commits to ensure exactly-once, then moves on.
Execution Sample
Kafka
producer.initTransactions();
producer.beginTransaction();
producer.send(outputRecord);
producer.sendOffsetsToTransaction(consumer.assignment(), groupMetadata);
producer.commitTransaction();
This code snippet shows how a Kafka producer uses transactions, including output records and input offsets, to ensure exactly-once stream processing.
Process Table
StepActionTransaction StateOutput ProducedInput Acknowledged
1Initialize transactionsInitializedNoneNo
2Begin transactionActiveNoneNo
3Send output recordActiveOutput record bufferedNo
4Send offsets to transactionActiveOutput record bufferedNo
5Commit transactionCommittedOutput record committedYes
6Process next recordNoneNoneNo
💡 Processing stops when no more input records are available.
Status Tracker
VariableStartAfter Step 2After Step 4After Step 5Final
transactionStateNoneActiveActiveCommittedNone
outputBufferEmptyEmptyBuffered output recordCommitted output recordEmpty
inputAcknowledgedNoNoNoYesNo
Key Moments - 3 Insights
Why do we need to begin and commit a transaction around sending output?
Because the transaction groups the output and input acknowledgment so they happen exactly once together, as shown in steps 2 to 5 in the execution table.
What happens if the transaction is not committed?
The output record is not saved permanently and the input is not acknowledged, so the record will be reprocessed, ensuring no data loss or duplicates.
Why send offsets to the transaction before committing?
Sending offsets before commit ensures that the output and input acknowledgment are stored atomically, preventing data loss or duplicates (see step 5).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the transaction state after step 3?
AInitialized
BCommitted
CActive
DNone
💡 Hint
Check the 'Transaction State' column at step 3 in the execution table.
At which step is the input record acknowledged?
AStep 2
BStep 5
CStep 4
DStep 6
💡 Hint
Look at the 'Input Acknowledged' column in the execution table.
If the transaction is not committed, what happens to the output record?
AIt is buffered but not saved
BIt is permanently saved
CIt is deleted immediately
DIt is acknowledged as sent
💡 Hint
Refer to the 'Output Produced' column before and after commit in the execution table.
Concept Snapshot
Exactly-once stream processing in Kafka uses transactions.
Start a transaction before sending output.
Send output records inside the transaction.
Send input offsets to the transaction.
Commit the transaction to save output and offsets atomically.
This ensures each input is processed exactly once.
Full Transcript
Exactly-once stream processing means each input record is processed one time only, no duplicates or losses. Kafka achieves this by using transactions. The process starts by initializing transactions. For each input record, a transaction begins. The output record is sent inside this transaction but not visible outside yet. Input offsets are sent to the transaction. When the transaction commits, the output and offsets are saved atomically. If a failure happens before commit, the output is discarded and input is not acknowledged, so the record will be retried. This flow guarantees exactly-once processing by linking output and input acknowledgment in one atomic transaction.