0
0
Kafkadevops~10 mins

Retention policies (time-based, size-based) in Kafka - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Retention policies (time-based, size-based)
Message Produced
Stored in Partition
Check Retention Policy
Time-based?
Delete if older than configured time
Size-based?
Delete oldest if size limit exceeded
Message Retained or Deleted
Messages are stored in partitions. Kafka checks if messages exceed time or size limits and deletes old messages accordingly.
Execution Sample
Kafka
topic.retention.ms=60000
# 1 minute retention time

topic.retention.bytes=1048576
# 1 MB max size
This config sets a topic to keep messages for 1 minute or until size exceeds 1 MB.
Process Table
StepActionMessage TimestampPartition Size (bytes)Retention CheckResult
1Message produced0 ms500000No deletionMessage stored
2Message produced30000 ms900000No deletionMessage stored
3Message produced61000 ms1200000Time check: 61000 > 60000Oldest message deleted
4Retention size checkN/A1100000Size check: 1100000 > 1048576Oldest message deleted
5Retention time check120000 ms900000Time check: 120000 > 60000Oldest message deleted
6Retention size checkN/A700000Size check: 700000 < 1048576No deletion
7No new messagesN/A700000No retention triggersPartition stable
💡 No new messages and partition size/time within limits, retention stops.
Status Tracker
VariableStartAfter 1After 2After 3After 4After 5After 6Final
Partition Size (bytes)050000090000012000001100000900000700000700000
Oldest Message Timestamp (ms)000300003000061000120000120000
Key Moments - 2 Insights
Why does Kafka delete messages at step 3 even though the partition size is below the size limit?
At step 3, the message timestamp exceeds the time-based retention limit (61000 ms > 60000 ms), so Kafka deletes the oldest message regardless of size, as shown in the execution_table row 3.
Why is there a size-based deletion at step 4 after time-based deletion?
After deleting old messages by time, the partition size is still above the size limit (1100000 > 1048576), so Kafka deletes oldest messages to reduce size, as seen in execution_table row 4.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the partition size after step 2?
A900000 bytes
B500000 bytes
C1200000 bytes
D700000 bytes
💡 Hint
Check the 'Partition Size (bytes)' column at step 2 in the execution_table.
At which step does the time-based retention policy first delete messages?
AStep 2
BStep 4
CStep 3
DStep 5
💡 Hint
Look for 'Time check' in the 'Retention Check' column in the execution_table.
If the retention time was increased to 2 minutes, what would happen at step 3?
AOldest message deleted due to time
BNo deletion due to time, only size-based deletion if needed
CPartition size would reset to zero
DAll messages deleted
💡 Hint
Refer to the time-based retention check at step 3 and consider the new retention time.
Concept Snapshot
Retention policies in Kafka control how long or how much data is kept.
Time-based retention deletes messages older than a set time (e.g., 1 minute).
Size-based retention deletes oldest messages if partition size exceeds a limit (e.g., 1 MB).
Kafka checks both policies regularly and deletes messages accordingly.
This helps manage storage and keep data fresh.
Full Transcript
Kafka stores messages in partitions. Each message has a timestamp and size. Kafka uses retention policies to decide when to delete old messages. Time-based retention deletes messages older than a configured time, like 1 minute. Size-based retention deletes oldest messages if the partition size grows beyond a set limit, like 1 MB. In the example, messages are produced with timestamps and sizes. At step 3, the time-based policy deletes messages older than 1 minute. At step 4, size-based policy deletes messages because the partition size is too big. This process repeats to keep data within limits. Understanding these policies helps manage Kafka storage efficiently.