Bulk indexing optimization in Elasticsearch - Time & Space Complexity
When adding many documents to Elasticsearch at once, it is important to understand how the time taken grows as the number of documents increases.
We want to know how the bulk indexing process scales with more data.
Analyze the time complexity of the following bulk indexing request.
POST /my_index/_bulk
{ "index": { "_id": "1" } }
{ "field": "value1" }
{ "index": { "_id": "2" } }
{ "field": "value2" }
{ "index": { "_id": "3" } }
{ "field": "value3" }
This code sends multiple documents in one bulk request to Elasticsearch for indexing.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Processing each document in the bulk request one by one.
- How many times: Once for each document in the bulk batch.
As the number of documents in the bulk request increases, the total work grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 document processes |
| 100 | 100 document processes |
| 1000 | 1000 document processes |
Pattern observation: Doubling the number of documents roughly doubles the work needed.
Time Complexity: O(n)
This means the time to index grows linearly with the number of documents sent in the bulk request.
[X] Wrong: "Sending more documents in one bulk request will make indexing time stay the same or grow very little."
[OK] Correct: Each document still needs to be processed, so the total time grows roughly in direct proportion to the number of documents.
Understanding how bulk indexing scales helps you design efficient data loading processes and shows you can reason about performance in real systems.
"What if we split the bulk request into many smaller batches instead of one large batch? How would the time complexity change?"