0
0
KafkaConceptBeginner · 3 min read

What is Retention in Kafka: Explanation and Usage

In Kafka, retention is the time or size limit that controls how long messages are stored in a topic before they are deleted. It helps manage disk space by automatically removing old data based on configured policies.
⚙️

How It Works

Imagine Kafka as a giant message warehouse where messages are stored on shelves called topics. Retention is like a rule that says how long or how much stuff can stay on those shelves before it must be cleared out to make room for new messages.

Kafka lets you set retention by time (for example, keep messages for 7 days) or by size (for example, keep up to 10 GB of messages). Once the limit is reached, Kafka automatically deletes the oldest messages to free space. This way, Kafka keeps running smoothly without running out of disk space.

💻

Example

This example shows how to set a topic's retention time to 1 day (24 hours) using Kafka's command line tool.

bash
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --alter --add-config retention.ms=86400000
Output
Updated configs for topic my-topic.
🎯

When to Use

Use retention settings when you want to control how long Kafka keeps your messages. For example, if you only need recent data for analytics, set a short retention time to save disk space. If you need to keep data longer for auditing or replaying events, increase the retention time.

Retention is useful in real-world cases like log collection, where old logs can be deleted after a week, or in event sourcing, where you might keep events for months to rebuild system state.

Key Points

  • Retention controls how long or how much data Kafka stores per topic.
  • It can be set by time (retention.ms) or size (retention.bytes).
  • Old messages are deleted automatically when limits are reached.
  • Proper retention settings help balance storage use and data availability.

Key Takeaways

Retention in Kafka defines how long or how much data is kept before deletion.
You can set retention by time or size to manage disk space automatically.
Adjust retention based on your data needs like analytics or auditing.
Kafka deletes old messages once retention limits are exceeded.
Proper retention settings keep Kafka storage efficient and reliable.