What Is a Compacted Topic in Kafka and How It Works
compacted topic in Kafka is a special type of topic that keeps only the latest value for each unique key, removing older duplicates. This helps save storage and ensures you always have the most recent update for each key.How It Works
Imagine a notebook where you write updates about different friends. Instead of keeping every single note, you erase old notes about a friend and keep only the latest one. A compacted topic works similarly in Kafka. It stores messages with keys and keeps only the newest message for each key, deleting older ones.
This process is called log compaction. Kafka scans the topic and removes older messages with the same key, so the topic always has the latest state per key. This is useful when you want to keep a snapshot of data, like user profiles or settings, without storing every change.
Example
bin/kafka-topics.sh --create --topic user-updates --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 --config cleanup.policy=compact bin/kafka-console-producer.sh --topic user-updates --bootstrap-server localhost:9092 --property parse.key=true --property key.separator=: user1:Alice user2:Bob user1:Alicia user3:Charlie bin/kafka-console-consumer.sh --topic user-updates --bootstrap-server localhost:9092 --from-beginning --property print.key=true --timeout-ms 5000
When to Use
Use compacted topics when you need to keep the latest state of data identified by keys, such as user profiles, configurations, or inventory levels. They are great for systems that need to recover state quickly or synchronize data without replaying all changes.
For example, a user profile service can use a compacted topic to store the latest profile info per user. If the service restarts, it reads the compacted topic to get the current state without processing every update.
Key Points
- Compacted topics keep only the latest message per key.
- They help save storage by removing old duplicates.
- Useful for state recovery and data synchronization.
- Configured by setting
cleanup.policy=compact.