Shard sizing strategy in Elasticsearch - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When working with Elasticsearch, how we size shards affects how fast queries and indexing run.
We want to understand how the number and size of shards impact the work Elasticsearch does.
Analyze the time complexity of querying data distributed across shards.
GET /my_index/_search
{
"query": { "match_all": {} },
"size": 10
}
This query searches all shards of the index and combines results.
Look at what repeats when Elasticsearch runs this query.
- Primary operation: Searching each shard separately.
- How many times: Once per shard in the index.
As you add more shards, Elasticsearch does more separate searches.
| Number of Shards (n) | Approx. Searches |
|---|---|
| 5 | 5 searches |
| 50 | 50 searches |
| 500 | 500 searches |
Pattern observation: The work grows directly with the number of shards.
Time Complexity: O(n)
This means the total work grows in a straight line as you add more shards.
[X] Wrong: "More shards always make queries faster because work is split more."
[OK] Correct: Each shard adds overhead, so too many shards can slow things down instead of speeding them up.
Understanding shard sizing helps you design Elasticsearch setups that balance speed and resource use, a key skill for real projects.
"What if we reduce the number of shards but increase the size of each shard? How would the time complexity change?"
Practice
Solution
Step 1: Understand shard purpose
Shards split data to distribute storage and speed up search operations.Step 2: Connect shard size to performance
Choosing the right shard size balances storage efficiency and search speed.Final Answer:
To balance data storage and search performance -> Option AQuick Check:
Shard size affects performance balance = A [OK]
- Thinking replicas control shard size
- Confusing shard count with replica count
- Assuming more shards always improve speed
Solution
Step 1: Identify shard count setting
The settingnumber_of_shardsdefines how many primary shards an index has.Step 2: Differentiate from replicas
number_of_replicascontrols copies, not primary shard count.Final Answer:
number_of_shards -> Option AQuick Check:
Primary shards = number_of_shards [OK]
- Confusing replicas with shards
- Using shard_size which is not a setting
- Mixing index refresh with shard count
Solution
Step 1: Calculate total size from shards
Total size = number of shards x size per shard = 5 x 20GB = 100GB.Step 2: Confirm no replicas included
Replicas add copies but do not affect primary data size calculation here.Final Answer:
100GB -> Option BQuick Check:
5 shards x 20GB = 100GB [OK]
- Adding replica size to primary data size
- Confusing shard count with replica count
- Choosing shard size instead of total
number_of_shards to 1 but your data size grows to 200GB. What is the main problem with this shard sizing?Solution
Step 1: Analyze shard size impact
One shard holding 200GB is large and can slow down search and indexing.Step 2: Identify correct problem
Too few shards for large data causes performance issues, not replica count or refresh interval.Final Answer:
Shard size is too large, causing slower search and indexing -> Option DQuick Check:
Large shard size = slower performance [OK]
- Blaming replica count instead of shard size
- Thinking many shards cause this problem
- Ignoring shard size impact on speed
Solution
Step 1: Calculate shard count range
Minimum shards = 500GB / 40GB โ 13 shards; maximum shards = 500GB / 10GB = 50 shards.Step 2: Choose shard count within range
To keep shard size between 10GB and 40GB, choose a shard count near 50.Final Answer:
50 shards -> Option CQuick Check:
500GB รท 50 shards = 10GB per shard [OK]
- Choosing too few shards causing large shard size
- Choosing too many shards causing overhead
- Ignoring shard size limits
