Document ID strategies (auto vs manual) in Elasticsearch - Performance Comparison
When storing documents in Elasticsearch, the way we assign IDs affects how fast operations run.
We want to know how the choice between automatic and manual IDs changes the work Elasticsearch does.
Analyze the time complexity of indexing documents with auto-generated IDs vs manual IDs.
POST /my_index/_doc/
{ "name": "Alice" }
POST /my_index/_doc/123
{ "name": "Bob" }
The first request lets Elasticsearch create an ID automatically. The second uses a manual ID "123".
Look at what Elasticsearch does each time it indexes a document.
- Primary operation: Checking if the document ID exists in the index.
- How many times: Once per document indexed.
With manual IDs, Elasticsearch must search for the ID to update or create. With auto IDs, it skips this search.
As you add more documents, the time to check for existing manual IDs grows.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 ID checks |
| 100 | 100 ID checks |
| 1000 | 1000 ID checks |
Each manual ID requires a lookup, so operations grow linearly with the number of documents.
Time Complexity: O(n)
This means the work grows in direct proportion to how many documents you index with manual IDs.
[X] Wrong: "Using manual IDs is always faster because I control the IDs."
[OK] Correct: Manual IDs require Elasticsearch to check if the ID exists, adding extra work that grows with more documents.
Understanding how ID strategies affect performance shows you can think about how data choices impact speed, a key skill in real projects.
"What if we batch index documents with manual IDs instead of one by one? How would the time complexity change?"