0
0
Kafkadevops~10 mins

Log compaction in Kafka - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Log compaction
Start: Kafka Topic with Log
New Messages Appended
Log grows with keys and values
Log Compaction Triggered
For each key, keep only latest message
Old duplicates removed
Compacted Log Ready for Consumers
Log compaction keeps only the latest message per key in a Kafka topic, removing older duplicates to save space and ensure consumers get the newest data.
Execution Sample
Kafka
key = "user1"
log = [
  ("user1", "data1"),
  ("user2", "data2"),
  ("user1", "data3")
]
compacted_log = {}
This code simulates a Kafka log with messages for keys and prepares for compaction by storing only the latest message per key.
Process Table
StepActionCurrent KeyCurrent ValueCompacted Log State
1Read first messageuser1data1{"user1": "data1"}
2Read second messageuser2data2{"user1": "data1", "user2": "data2"}
3Read third messageuser1data3{"user1": "data3", "user2": "data2"}
4Log compaction complete--{"user1": "data3", "user2": "data2"}
💡 All messages processed; compacted log keeps only latest value per key.
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
compacted_log{}{"user1": "data1"}{"user1": "data1", "user2": "data2"}{"user1": "data3", "user2": "data2"}{"user1": "data3", "user2": "data2"}
Key Moments - 2 Insights
Why does the value for 'user1' change after step 3?
Because log compaction keeps only the latest message per key, the value for 'user1' is updated from 'data1' to 'data3' as shown in execution_table step 3.
Does log compaction remove keys completely if they appear only once?
No, keys with a single message remain in the compacted log. Only older duplicates are removed, as seen in execution_table where 'user2' stays unchanged.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2, what keys are present in the compacted log?
A["user1"]
B["user1", "user2"]
C["user2"]
D[]
💡 Hint
Check the 'Compacted Log State' column at step 2 in execution_table.
At which step does the value for 'user1' get updated to 'data3'?
AStep 3
BStep 2
CStep 1
DStep 4
💡 Hint
Look at the 'Current Value' and 'Compacted Log State' columns in execution_table.
If a new message ("user2", "data4") is appended after step 3, what would happen in the compacted log?
ANo change to compacted log
B"user2" is removed
C"user2" value updates to "data4"
D"user1" value changes to "data4"
💡 Hint
Log compaction keeps the latest message per key, so new messages update the key's value.
Concept Snapshot
Log compaction in Kafka:
- Keeps only the latest message per key in a topic
- Removes older duplicates to save space
- Ensures consumers read the newest data
- Runs in background without stopping producers
- Useful for changelog or state topics
Full Transcript
Log compaction is a Kafka feature that keeps only the latest message for each key in a topic. As new messages arrive, older messages with the same key are removed during compaction. This process helps save storage and ensures consumers get the most recent data for each key. The example shows a log with messages for 'user1' and 'user2'. After reading all messages, the compacted log keeps only the latest value for 'user1' and the single value for 'user2'. This is done step-by-step by updating the compacted log dictionary. Key moments include understanding why values update and that keys with single messages remain. The quizzes test understanding of the compacted log state at different steps and how new messages affect it.