Overview - Log compaction
What is it?
Log compaction is a feature in Apache Kafka that keeps the latest value for each unique key in a topic. Instead of deleting old messages based on time or size, it removes older duplicates of the same key, ensuring only the most recent update remains. This helps maintain a compacted log that represents the current state of data. It is useful for topics where the latest state matters more than the full history.
Why it matters
Without log compaction, Kafka topics can grow indefinitely, storing every change ever made, which wastes storage and slows down consumers. Log compaction solves this by keeping only the latest update per key, making data storage efficient and enabling systems to rebuild state quickly. This is crucial for systems like caches, databases, or configurations that need the current snapshot rather than full change history.
Where it fits
Before learning log compaction, you should understand Kafka basics like topics, partitions, producers, and consumers. After mastering log compaction, you can explore Kafka's retention policies, exactly-once semantics, and stateful stream processing with Kafka Streams.