Index settings (shards, replicas) in Elasticsearch - Time & Space Complexity
When working with Elasticsearch, the way we set shards and replicas affects how fast queries and indexing happen.
We want to understand how the number of shards and replicas changes the work Elasticsearch does.
Analyze the time complexity of this index settings configuration.
PUT /my-index
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
}
}
This code creates an index with 5 primary shards and 1 replica for each shard.
Look at what happens when Elasticsearch processes data with these settings.
- Primary operation: Writing a document to its primary shard (determined by hash) and replicating it to the replicas of that shard.
- How many times: 1 (primary) + r (replicas) times per document.
As the data size grows, Elasticsearch splits it across shards and copies it to replicas.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 * (1 + 1) = 20 writes (spread across up to 5 shards) |
| 100 | 100 * (1 + 1) = 200 writes (spread across up to 5 shards) |
| 1000 | 1000 * (1 + 1) = 2000 writes (spread across up to 5 shards) |
Pattern observation: The total work grows linearly with data size n, multiplied by (1 + r). The number of shards s allows distributing these writes for better parallelism.
Time Complexity: O(n * (1 + r))
This means the work grows linearly with data size n, multiplied by the replication factor (1 + r). Number of shards s does not affect total writes but enables parallel processing.
[X] Wrong: "Adding more replicas makes indexing faster because data is copied in parallel."
[OK] Correct: Replicas add extra work because data must be written multiple times, so indexing actually takes more time overall.
Understanding how shards and replicas affect work helps you explain Elasticsearch performance clearly and confidently.
"What if we increased the number of shards but kept replicas the same? How would the time complexity change?"