Source connectors in Kafka - Time & Space Complexity
When using source connectors in Kafka, it's important to understand how the time to process data grows as the input size increases.
We want to know how the connector's work changes when more data arrives.
Analyze the time complexity of the following source connector poll method.
public List<SourceRecord> poll() throws InterruptedException {
List<SourceRecord> records = new ArrayList<>();
while (hasMoreData()) {
SourceRecord record = readNextRecord();
records.add(record);
}
return records;
}
This code reads all available data from the source system and collects it into a list to send to Kafka.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The while loop that reads each record one by one.
- How many times: Once for every available record in the source system.
As the number of records to read grows, the time to run this method grows in a similar way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 reads and adds |
| 100 | About 100 reads and adds |
| 1000 | About 1000 reads and adds |
Pattern observation: The work grows directly with the number of records to process.
Time Complexity: O(n)
This means the time to read records grows in a straight line as the number of records increases.
[X] Wrong: "The connector reads all data instantly regardless of size."
[OK] Correct: Each record must be read one by one, so more data means more time.
Understanding how source connectors scale with data size helps you explain real-world data flow and performance in Kafka systems.
"What if the connector batches records in groups instead of one by one? How would the time complexity change?"