Cross-datacenter replication in Kafka - Time & Space Complexity
When Kafka replicates data across datacenters, it sends messages from one place to another to keep them in sync.
We want to understand how the time to replicate grows as the amount of data increases.
Analyze the time complexity of the following code snippet.
// Simplified Kafka cross-datacenter replication logic
for (Message msg : localPartition) {
sendToRemoteDatacenter(msg);
waitForAck(msg);
}
This code sends each message from a local partition to a remote datacenter and waits for confirmation before sending the next.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping through each message in the local partition.
- How many times: Once per message, so as many times as there are messages.
As the number of messages grows, the time to send and wait for acknowledgments grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 sends and waits |
| 100 | 100 sends and waits |
| 1000 | 1000 sends and waits |
Pattern observation: The time grows directly with the number of messages; doubling messages doubles the work.
Time Complexity: O(n)
This means the time to replicate grows in a straight line with the number of messages.
[X] Wrong: "Replication time stays the same no matter how many messages there are."
[OK] Correct: Each message must be sent and confirmed, so more messages mean more work and more time.
Understanding how replication time grows helps you design systems that stay fast even as data grows.
"What if we send messages in batches instead of one by one? How would the time complexity change?"