0
0
Kafkadevops~15 mins

Retention policies (time-based, size-based) in Kafka - Deep Dive

Choose your learning style9 modes available
Overview - Retention policies (time-based, size-based)
What is it?
Retention policies in Kafka control how long or how much data is kept in a topic before it is deleted. Time-based retention deletes messages older than a set time, while size-based retention deletes messages when the topic's data exceeds a set size. These policies help manage storage and ensure Kafka does not run out of space. They work automatically in the background without manual intervention.
Why it matters
Without retention policies, Kafka topics could grow endlessly, filling up disk space and causing system failures. Retention policies keep data manageable and predictable, allowing Kafka to run smoothly and reliably. They also help balance between keeping enough data for consumers and freeing up resources. This makes Kafka practical for real-world use where data volume is large and continuous.
Where it fits
Before learning retention policies, you should understand Kafka topics, partitions, and how Kafka stores messages. After mastering retention policies, you can explore Kafka compaction, consumer groups, and data cleanup strategies. Retention policies are part of Kafka's data lifecycle management.
Mental Model
Core Idea
Retention policies automatically remove old or excess data from Kafka topics to keep storage under control and ensure system stability.
Think of it like...
It's like a refrigerator that automatically throws away food after a certain date or when it gets too full, so it never overflows and always has space for fresh items.
┌───────────────────────────────┐
│         Kafka Topic           │
│ ┌───────────────┐             │
│ │ Messages     │             │
│ │ ┌───────────┐ │             │
│ │ │ Time-based│ │             │
│ │ │ Retention │ │             │
│ │ └───────────┘ │             │
│ │ ┌───────────┐ │             │
│ │ │ Size-based│ │             │
│ │ │ Retention │ │             │
│ │ └───────────┘ │             │
│ └───────────────┘             │
└───────────────────────────────┘
Build-Up - 7 Steps
1
FoundationWhat is Kafka Retention Policy
🤔
Concept: Introduction to the idea that Kafka deletes old data automatically based on rules.
Kafka stores messages in topics. Without limits, these topics grow forever. Retention policies tell Kafka when to delete old messages to save space. Two main types exist: time-based and size-based retention.
Result
Learner understands that retention policies prevent infinite data growth in Kafka topics.
Knowing that Kafka manages data lifecycle automatically helps avoid manual cleanup and system crashes.
2
FoundationDifference Between Time and Size Retention
🤔
Concept: Explaining the two main retention types and how they decide what to delete.
Time-based retention deletes messages older than a set time (e.g., 7 days). Size-based retention deletes messages when the total topic size exceeds a limit (e.g., 1 GB). Both work independently and can be combined.
Result
Learner can distinguish between deleting data by age versus by total size.
Understanding these two methods clarifies how Kafka balances data availability and storage limits.
3
IntermediateConfiguring Time-Based Retention in Kafka
🤔Before reading on: do you think setting retention.ms to 86400000 deletes messages older than 1 hour or 1 day? Commit to your answer.
Concept: How to set time-based retention using Kafka configuration properties.
Kafka uses the property retention.ms to set time-based retention in milliseconds. For example, retention.ms=604800000 means messages older than 7 days are deleted. This is set per topic or globally in the broker config.
Result
Learner knows how to configure time-based retention and what the values mean.
Knowing the exact config key and units prevents common mistakes that cause unexpected data loss or retention.
4
IntermediateConfiguring Size-Based Retention in Kafka
🤔Before reading on: does retention.bytes limit the size per partition or the whole topic? Commit to your answer.
Concept: How to set size-based retention using Kafka configuration properties.
Kafka uses retention.bytes to limit the size of data per partition. When the size exceeds this, older messages are deleted. For example, retention.bytes=1073741824 limits to 1 GB per partition. This helps control disk usage precisely.
Result
Learner understands how to limit topic size and the scope of size limits.
Knowing size limits apply per partition helps design topics with correct partition counts and storage planning.
5
IntermediateHow Kafka Deletes Messages Internally
🤔
Concept: Understanding Kafka's background process that removes expired data.
Kafka runs a background cleaner that checks messages against retention policies. It deletes segments (files) fully expired by time or size. Partial segments are kept until fully expired. This process is automatic and transparent to users.
Result
Learner understands the deletion process is segment-based, not message-by-message.
Knowing segment-level deletion explains why retention is approximate and why some old messages may linger briefly.
6
AdvancedCombining Time and Size Retention Policies
🤔Before reading on: if both time and size limits are set, which one triggers deletion first? Commit to your answer.
Concept: How Kafka applies both retention policies together to decide when to delete data.
Kafka deletes data when either the time limit or size limit is exceeded. This means messages older than retention.ms or when partition size exceeds retention.bytes will be removed. This dual check ensures flexible control over data lifecycle.
Result
Learner understands that retention policies work as OR conditions, not AND.
Knowing this prevents surprises where data is deleted earlier than expected due to size limits.
7
ExpertRetention Policy Impact on Consumer Behavior
🤔Before reading on: do retention policies affect only storage or also consumer message availability? Commit to your answer.
Concept: How retention policies influence what messages consumers can read and potential data loss scenarios.
Retention policies delete messages permanently, so consumers that read late may miss data. This means retention settings must balance storage limits and consumer needs. Also, retention does not affect compacted topics differently, which keep latest keys regardless of retention.
Result
Learner understands retention affects data availability and consumer design.
Knowing retention impacts consumers helps design systems that avoid data loss and ensure timely processing.
Under the Hood
Kafka stores messages in log segments on disk per partition. Each segment is a file with messages sorted by offset. The retention cleaner scans these segments periodically. If a segment's messages are all older than retention.ms or the total size exceeds retention.bytes, the entire segment file is deleted. Partial segments are kept until fully expired. This design optimizes disk I/O and avoids deleting individual messages, which would be inefficient.
Why designed this way?
Segment-based retention was chosen to optimize performance and reduce overhead. Deleting whole files is faster and simpler than deleting individual messages. Also, Kafka's append-only log design fits well with segment deletion. Alternatives like per-message deletion would slow down writes and complicate storage management. The dual retention policies provide flexible control for different use cases.
┌───────────────┐
│ Kafka Topic   │
│ ┌───────────┐ │
│ │ Partition │ │
│ │ ┌───────┐ │ │
│ │ │Segment│ │ │
│ │ │ Files │ │ │
│ │ └───────┘ │ │
│ └───────────┘ │
└─────┬─────────┘
      │ Cleaner scans segments
      ▼
┌─────────────────────────┐
│ Check segment age & size│
│ If expired, delete file │
└─────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does retention.ms delete messages exactly at the time limit or sometime after? Commit to yes or no.
Common Belief:Retention.ms deletes messages exactly when they reach the time limit.
Tap to reveal reality
Reality:Retention.ms triggers deletion only after the segment containing those messages is eligible, which can be later than the exact time limit.
Why it matters:Expecting exact deletion timing can cause confusion when old messages appear longer than expected, leading to wrong assumptions about retention settings.
Quick: Does retention.bytes limit the total topic size or per partition? Commit to one.
Common Belief:Retention.bytes limits the total size of the entire topic.
Tap to reveal reality
Reality:Retention.bytes limits the size per partition, not the whole topic.
Why it matters:Misunderstanding this can cause storage planning errors, especially with many partitions, leading to unexpected disk usage.
Quick: Does retention policy affect compacted topics the same way as normal topics? Commit to yes or no.
Common Belief:Retention policies delete messages in compacted topics just like normal topics.
Tap to reveal reality
Reality:Compacted topics keep the latest message per key regardless of retention.ms or retention.bytes, so retention policies behave differently.
Why it matters:Confusing this can cause data loss or misunderstanding of compacted topic behavior, affecting critical data retention.
Quick: Can retention policies cause consumers to lose messages if they read late? Commit to yes or no.
Common Belief:Retention policies only affect storage and do not impact consumer message availability.
Tap to reveal reality
Reality:Retention policies delete messages permanently, so late consumers may miss data if retention expires before they read.
Why it matters:Ignoring this can lead to data loss in consumer applications and unexpected bugs.
Expert Zone
1
Retention policies operate at the partition segment level, so message deletion is approximate and depends on segment boundaries.
2
Setting very low retention.ms or retention.bytes can cause frequent segment deletions, impacting performance and consumer lag.
3
Compacted topics combine retention with key-based cleanup, requiring careful tuning to avoid unintended data loss.
When NOT to use
Retention policies are not suitable when you need to keep all data indefinitely or require precise message-level deletion. In such cases, use Kafka log compaction or external storage systems like HDFS or cloud storage for archiving.
Production Patterns
In production, teams set retention.ms to balance data freshness and storage cost, often using size-based retention to prevent disk overflow. They monitor topic sizes and consumer lag to adjust policies dynamically. Compacted topics are used for changelog or state data, while time/size retention is used for event streams.
Connections
Database Archiving
Both manage data lifecycle by removing old or excess data to save space.
Understanding retention policies in Kafka helps grasp how databases archive or purge old records to maintain performance.
Garbage Collection in Programming
Retention policies are like garbage collectors that remove unused data automatically.
Knowing how garbage collection frees memory clarifies why Kafka deletes whole segments instead of individual messages for efficiency.
Refrigerator Food Management
Both use time and space limits to decide when to discard items.
This cross-domain insight shows how everyday systems balance freshness and capacity, similar to Kafka's data retention.
Common Pitfalls
#1Setting retention.ms too low causing premature data loss.
Wrong approach:retention.ms=60000 # 1 minute retention
Correct approach:retention.ms=604800000 # 7 days retention
Root cause:Misunderstanding the time unit (milliseconds) and setting too small a value leads to losing data consumers haven't processed yet.
#2Assuming retention.bytes limits total topic size instead of per partition.
Wrong approach:retention.bytes=1073741824 # expecting 1GB total topic size
Correct approach:retention.bytes=1073741824 # actually 1GB per partition, adjust partition count accordingly
Root cause:Not knowing retention.bytes applies per partition causes storage planning errors and unexpected disk usage.
#3Expecting retention policies to delete messages immediately at expiration time.
Wrong approach:Believing messages older than retention.ms are deleted instantly.
Correct approach:Understanding that deletion happens when entire segments expire, so some old messages may remain briefly.
Root cause:Ignoring Kafka's segment-based storage model leads to wrong expectations about deletion timing.
Key Takeaways
Kafka retention policies automatically delete old or excess data to control storage and system health.
Time-based retention deletes messages older than a set time, while size-based retention deletes when partition size exceeds a limit.
Retention policies operate at the segment level, so deletion timing is approximate, not exact per message.
Retention.bytes limits size per partition, not the whole topic, which affects storage planning.
Retention policies impact data availability for consumers, so settings must balance storage and processing needs.