Common connectors (JDBC, S3, Elasticsearch) in Kafka - Time & Space Complexity
When Kafka connects to systems like JDBC databases, S3 storage, or Elasticsearch, it runs tasks to move data.
We want to know how the time taken grows as the data or requests increase.
Analyze the time complexity of this Kafka connector polling and processing loop.
while (true) {
records = pollFromSource();
for (record of records) {
processRecord(record);
sendToKafka(record);
}
commitOffsets();
}
This code polls data from a source like JDBC, S3, or Elasticsearch, processes each record, sends it to Kafka, and commits progress.
Look at what repeats as data grows.
- Primary operation: Loop over all records fetched in each poll.
- How many times: Once per record in the batch each poll cycle.
More records mean more processing steps.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 process and send steps |
| 100 | About 100 process and send steps |
| 1000 | About 1000 process and send steps |
Pattern observation: The work grows directly with the number of records.
Time Complexity: O(n)
This means the time to process grows in a straight line with the number of records.
[X] Wrong: "The connector time stays the same no matter how many records come in."
[OK] Correct: Each record needs processing, so more records mean more time.
Understanding how connectors scale with data size helps you design systems that handle growth smoothly.
What if the connector batches records differently, fetching larger chunks less often? How would the time complexity change?