0
0
Kafkadevops~10 mins

GroupBy and aggregation in Kafka - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - GroupBy and aggregation
Input Stream of Records
Group records by key
Apply aggregation function
Emit aggregated results
Output Stream
Data flows in, records are grouped by a key, then an aggregation function combines values per group, producing summarized output.
Execution Sample
Kafka
KStream<String, Integer> stream = builder.stream("input-topic");
KTable<String, Integer> aggregated = stream
  .groupByKey()
  .reduce((v1, v2) -> v1 + v2);
aggregated.toStream().to("output-topic");
This code groups records by key and sums their integer values, then writes results to an output topic.
Process Table
StepInput Record (key,value)GroupBy KeyAggregation FunctionAggregated ValueOutput Emitted
1(A, 5)AInitial value 55Yes (A,5)
2(B, 3)BInitial value 33Yes (B,3)
3(A, 7)A5 + 712Yes (A,12)
4(B, 2)B3 + 25Yes (B,5)
5(A, 1)A12 + 113Yes (A,13)
6No more input---Stream ends
💡 No more input records, aggregation completes.
Status Tracker
VariableStartAfter 1After 2After 3After 4After 5Final
Aggregated Value for Anull5512121313
Aggregated Value for Bnullnull33555
Key Moments - 3 Insights
Why is there output after the first record?
The first record sets the initial aggregated value (without applying the aggregation function) and emits it. Further records apply the function and emit updates.
How does the aggregation function combine values?
It adds the new value to the current aggregated value for the key, as shown in steps 3, 4, and 5 in the execution table.
What happens when no more input records arrive?
The stream ends and no further aggregation updates occur, as noted in the exit note and step 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the aggregated value for key 'A' after step 3?
A5
B7
C12
D13
💡 Hint
Check the 'Aggregated Value' column at step 3 for key 'A'.
At which step does the output for key 'B' first get emitted?
AStep 4
BStep 2
CStep 3
DStep 5
💡 Hint
Look at the 'Output Emitted' column for key 'B' in the execution table.
If the aggregation function was changed to multiply values instead of add, what would be the aggregated value for key 'A' after step 5?
A105
B35
C65
D13
💡 Hint
Multiply values 5 * 7 * 1 for key 'A' from steps 1, 3, and 5.
Concept Snapshot
GroupBy and aggregation in Kafka Streams:
- Use groupByKey() to group records by key.
- Apply aggregation like reduce() to combine values per key.
- Aggregation updates on each new record.
- Results emitted as a KTable stream.
- Useful for counting, summing, or other summaries.
Full Transcript
This visual trace shows how Kafka Streams processes records by grouping them by key and aggregating their values. Each input record updates the aggregated value for its key using the aggregation function. Outputs are emitted when aggregation changes. The process stops when no more input records arrive.