0
0
KafkaComparisonBeginner · 4 min read

Delete vs Compact Cleanup Policy in Kafka: Key Differences and Usage

In Kafka, the delete cleanup policy removes messages after a retention time or size limit, while the compact policy keeps only the latest message per key, removing older duplicates. Delete is for time-based cleanup, and compact is for maintaining the latest state per key.
⚖️

Quick Comparison

This table summarizes the main differences between the delete and compact cleanup policies in Kafka.

AspectDelete Cleanup PolicyCompact Cleanup Policy
PurposeRemove old messages after retention time or sizeKeep only the latest message per key, remove duplicates
Data RetentionBased on time or size limitsBased on key uniqueness, no time limit
Use CaseLog data, event streams with expiryState stores, changelogs, latest updates
Message RemovalDeletes messages older than retentionDeletes older messages with same key
Data IntegrityMay lose data after retentionAlways keeps latest state per key
Performance ImpactSimple deletion, less CPUMore CPU for compaction process
⚖️

Key Differences

The delete cleanup policy in Kafka removes messages based on a configured retention time or size limit. When messages exceed these limits, Kafka deletes them to free up space. This policy is useful when you want to keep data only for a limited time, such as logs or event streams that become irrelevant after some time.

On the other hand, the compact cleanup policy focuses on keeping the latest message for each unique key. Kafka scans the log and removes older messages with the same key, ensuring that only the most recent update per key remains. This is ideal for use cases like maintaining a current state or changelog, where you want to always have the latest information without duplicates.

While delete is time or size-driven, compact is key-driven. Compacting requires more CPU resources because Kafka must track keys and perform background compaction. Delete is simpler and faster but may lose data after retention expires. Compact ensures data integrity per key but may keep data indefinitely if keys keep updating.

💻

Delete Cleanup Policy Example

This example shows how to configure a Kafka topic with the delete cleanup policy and produce messages.

bash
bin/kafka-topics.sh --create --topic delete-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 --config cleanup.policy=delete --config retention.ms=60000

bin/kafka-console-producer.sh --topic delete-topic --bootstrap-server localhost:9092
key1:message1
key2:message2
key1:message3
Output
Topic 'delete-topic' created with delete cleanup policy and 1-minute retention. Messages produced to 'delete-topic'. Older messages will be deleted after 1 minute.
↔️

Compact Cleanup Policy Equivalent

This example shows how to configure a Kafka topic with the compact cleanup policy and produce messages with keys.

bash
bin/kafka-topics.sh --create --topic compact-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 --config cleanup.policy=compact

bin/kafka-console-producer.sh --topic compact-topic --bootstrap-server localhost:9092 --property parse.key=true --property key.separator=:
key1:message1
key2:message2
key1:message3
Output
Topic 'compact-topic' created with compact cleanup policy. Messages produced to 'compact-topic' with keys. Kafka will keep only the latest message per key after compaction.
🎯

When to Use Which

Choose the delete cleanup policy when you want to remove old data after a certain time or size, such as logs or event streams that do not need to be kept forever. It is simple and efficient for time-based retention.

Choose the compact cleanup policy when you need to maintain the latest state per key, like in changelog topics or state stores. It ensures you always have the most recent update for each key, which is critical for stateful stream processing.

In some cases, you can combine both policies (compact,delete) to keep the latest state but also remove data older than a retention period.

Key Takeaways

Delete policy removes messages after a retention time or size limit.
Compact policy keeps only the latest message per key, removing duplicates.
Use delete for time-based log cleanup and compact for stateful data.
Compaction uses more CPU but preserves the latest state per key.
You can combine both policies for flexible retention and compaction.