0
0
Kafkadevops~5 mins

Retention policies (time-based, size-based) in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
Kafka retention policies control how long or how much data is kept in a topic. This helps manage disk space and ensures old data is removed automatically.
When you want to keep messages only for a certain number of hours or days to save disk space.
When you want to limit the total size of data stored in a topic to avoid filling up your server.
When you need to automatically delete old messages after a set time without manual cleanup.
When you want to keep data until it reaches a size limit, then remove the oldest messages first.
When you want to balance between data availability and storage costs by controlling retention.
Config File - server.properties
server.properties
log.retention.hours=168
log.retention.bytes=1073741824

log.retention.hours sets the time in hours to keep messages before deletion (168 hours = 7 days).

log.retention.bytes sets the maximum size in bytes for the log before old data is deleted (1073741824 bytes = 1 GB).

Kafka deletes the oldest data when either limit is reached.

Commands
Create a new Kafka topic named 'example-topic' with 1 partition and 1 replica to start using retention policies.
Terminal
kafka-topics.sh --create --topic example-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
Expected OutputExpected
Created topic example-topic.
--topic - Name of the topic to create
--partitions - Number of partitions for the topic
--replication-factor - Number of replicas for fault tolerance
Set a time-based retention policy of 7 days (604800000 milliseconds) on 'example-topic'.
Terminal
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name example-topic --alter --add-config retention.ms=604800000
Expected OutputExpected
No output (command runs silently)
--entity-type topics - Specify that the config is for a topic
--entity-name example-topic - Target the specific topic
--alter - Change the existing configuration
--add-config retention.ms=604800000 - Set retention time to 7 days in milliseconds
Set a size-based retention policy of 1 GB (1073741824 bytes) on 'example-topic'.
Terminal
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name example-topic --alter --add-config retention.bytes=1073741824
Expected OutputExpected
No output (command runs silently)
--entity-type topics - Specify that the config is for a topic
--entity-name example-topic - Target the specific topic
--alter - Change the existing configuration
--add-config retention.bytes=1073741824 - Set retention size to 1 GB
Check the current retention settings for 'example-topic' to verify the policies are applied.
Terminal
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name example-topic --describe
Expected OutputExpected
Configs for topic 'example-topic' are retention.ms=604800000,retention.bytes=1073741824
--describe - Show current configuration for the topic
Key Concept

If you remember nothing else from this pattern, remember: Kafka deletes old messages automatically when either the time limit or size limit you set is reached.

Common Mistakes
Setting retention.ms or retention.bytes to zero or negative values
This disables retention and can cause Kafka to delete all messages immediately or behave unexpectedly.
Always set retention.ms and retention.bytes to positive values representing your desired limits.
Not verifying the retention settings after applying them
You might think the policy is applied but it could fail silently or be overridden by defaults.
Use kafka-configs.sh --describe to confirm the retention policies are active on the topic.
Setting both retention.ms and retention.bytes to very high values without considering disk space
Kafka will keep data until one limit is reached; if both are very high, disk can fill up causing failures.
Set realistic retention limits based on your storage capacity and data needs.
Summary
Create a Kafka topic to apply retention policies.
Use kafka-configs.sh to set time-based retention with retention.ms.
Use kafka-configs.sh to set size-based retention with retention.bytes.
Verify retention settings with kafka-configs.sh --describe.