What is number_of_shards in Elasticsearch: Explanation and Example
number_of_shards in Elasticsearch defines how many pieces an index is split into for storage and search. Each shard holds a subset of the data, allowing Elasticsearch to distribute and parallelize operations efficiently.How It Works
Imagine you have a big book that you want to share with friends so they can read it faster. Instead of giving the whole book to one person, you split it into several smaller chapters and give each friend a chapter. In Elasticsearch, number_of_shards works like these chapters. It splits your data index into smaller parts called shards.
Each shard is a self-contained piece of the index that can be stored on different servers or nodes. This splitting helps Elasticsearch search and store data faster because it can work on many shards at the same time, just like your friends reading different chapters in parallel.
By default, Elasticsearch creates 5 shards per index, but you can set number_of_shards to a higher or lower number to improve performance and scalability depending on your data size and cluster setup.
Example
This example shows how to create an Elasticsearch index with 3 shards using the number_of_shards setting.
PUT /my_index
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}When to Use
Use number_of_shards when you create a new index to control how Elasticsearch splits your data. If you expect a large amount of data or high search traffic, increasing the number of shards can help distribute the load and improve speed.
For example, a company storing millions of customer records might use multiple shards to allow faster searches and updates. However, too many shards can add overhead, so balance is important.
Remember, you cannot change the number of primary shards after the index is created, so plan this setting carefully based on your data size and growth expectations.
Key Points
- number_of_shards splits an index into smaller parts for better performance.
- Each shard is a self-contained index that can be stored on different nodes.
- Set this value when creating an index; it cannot be changed later.
- More shards can improve speed but add overhead if too many.
- Balance shard count based on data size and cluster capacity.