Cluster, node, and shard architecture in Elasticsearch - Time & Space Complexity
When working with Elasticsearch, it is important to understand how the cluster, nodes, and shards affect performance.
We want to know how the time to search or index data grows as the cluster size or shard count increases.
Analyze the time complexity of querying data spread across shards in a cluster.
GET /my_index/_search
{
"query": {
"match": { "field": "value" }
}
}
This query searches across all shards of the index distributed on multiple nodes in the cluster.
Look at what repeats when the query runs:
- Primary operation: Each shard runs the query independently.
- How many times: Once per shard in the index.
As the number of shards grows, the query runs on more pieces of data separately.
| Input Size (shards) | Approx. Operations |
|---|---|
| 10 | 10 query runs |
| 100 | 100 query runs |
| 1000 | 1000 query runs |
Pattern observation: The total work grows linearly with the number of shards.
Time Complexity: O(n)
This means the time to complete the query grows directly with the number of shards involved.
[X] Wrong: "Adding more shards always makes queries faster because work is split."
[OK] Correct: More shards mean more separate queries to run and combine, which can increase total time.
Understanding how Elasticsearch splits work helps you explain performance trade-offs clearly and shows you know how distributed systems behave.
"What if we changed the number of replicas per shard? How would that affect the time complexity of queries?"