Compression (gzip, snappy, lz4) in Kafka - Time & Space Complexity
When Kafka compresses messages, it changes how long it takes to process data. We want to understand how the time needed grows as the data size grows.
The question is: How does compression time increase when we send more data?
Analyze the time complexity of the following Kafka compression code snippet.
// Compress a batch of messages using gzip
val compressedData = CompressionUtils.compress(messagesBatch, CompressionType.GZIP)
// Send compressed data to Kafka broker
producer.send(new ProducerRecord(topic, compressedData))
// Consumer decompresses before processing
val decompressedData = CompressionUtils.decompress(compressedData, CompressionType.GZIP)
// Process messages
processMessages(decompressedData)
This code compresses a batch of messages with gzip before sending, then decompresses them on the consumer side.
Look for repeated work inside compression and decompression.
- Primary operation: Scanning each byte of the message batch to compress or decompress.
- How many times: Once per message batch during compression, and once during decompression.
Compression and decompression time grows roughly in direct proportion to the size of the data.
| Input Size (n bytes) | Approx. Operations |
|---|---|
| 10 KB | 10,000 operations |
| 100 KB | 100,000 operations |
| 1 MB | 1,000,000 operations |
Pattern observation: Doubling the data size roughly doubles the work needed for compression and decompression.
Time Complexity: O(n)
This means the time to compress or decompress grows linearly with the size of the data.
[X] Wrong: "Compression time stays the same no matter how much data we send."
[OK] Correct: Compression must look at every byte, so more data means more work and more time.
Understanding how compression time grows helps you explain performance trade-offs clearly. This skill shows you can think about real system costs, which is valuable in many programming roles.
"What if we switched from gzip to snappy, which is faster but less compressed? How would the time complexity change?"