Disaster recovery planning in Kafka - Time & Space Complexity
When planning disaster recovery in Kafka, it's important to understand how recovery time grows as data size increases.
We want to know how the time to restore data changes when the amount of messages or partitions grows.
Analyze the time complexity of restoring data from Kafka backups.
// Pseudocode for restoring Kafka topic data
for each partition in topicPartitions {
for each message in partitionBackup {
produce message to Kafka topic partition
}
}
This code restores messages by replaying each message from backup into each partition.
Look at what repeats in the code.
- Primary operation: Sending each message back to Kafka.
- How many times: Once for every message in every partition backup.
As the number of messages grows, the time to restore grows too.
| Input Size (messages) | Approx. Operations |
|---|---|
| 10 | 10 sends |
| 100 | 100 sends |
| 1000 | 1000 sends |
Pattern observation: The time grows directly with the number of messages to restore.
Time Complexity: O(n)
This means the recovery time grows in a straight line with the number of messages to restore.
[X] Wrong: "Recovery time stays the same no matter how many messages there are."
[OK] Correct: Each message must be replayed, so more messages mean more work and longer recovery.
Understanding how recovery time scales helps you design systems that can bounce back quickly after problems.
"What if we parallelize restoring partitions? How would that affect the time complexity?"