Overview - Retention policies (time-based, size-based)
What is it?
Retention policies in Kafka control how long or how much data is kept in a topic before it is deleted. Time-based retention deletes messages older than a set time, while size-based retention deletes messages when the topic's data exceeds a set size. These policies help manage storage and ensure Kafka does not run out of space. They work automatically in the background without manual intervention.
Why it matters
Without retention policies, Kafka topics could grow endlessly, filling up disk space and causing system failures. Retention policies keep data manageable and predictable, allowing Kafka to run smoothly and reliably. They also help balance between keeping enough data for consumers and freeing up resources. This makes Kafka practical for real-world use where data volume is large and continuous.
Where it fits
Before learning retention policies, you should understand Kafka topics, partitions, and how Kafka stores messages. After mastering retention policies, you can explore Kafka compaction, consumer groups, and data cleanup strategies. Retention policies are part of Kafka's data lifecycle management.