0
0
ElasticsearchHow-ToIntermediate · 4 min read

Optimize Indexing Performance in Elasticsearch: Best Practices

To optimize indexing performance in Elasticsearch, use bulk API to reduce overhead, increase the refresh_interval during heavy indexing, and disable replicas temporarily. Also, simplify mappings and avoid unnecessary fields to speed up indexing.
📐

Syntax

Key settings and APIs to optimize indexing include:

  • bulk API: Send multiple documents in one request to reduce overhead.
  • refresh_interval: Controls how often Elasticsearch refreshes the index to make documents searchable.
  • number_of_replicas: Number of replica shards; reducing this speeds up indexing.
  • mapping: Define fields and types to avoid costly dynamic mapping.
json
POST /_bulk
{ "index" : {"_index" : "my_index", "_id" : "1"} }
{ "field1" : "value1" }
{ "index" : {"_index" : "my_index", "_id" : "2"} }
{ "field1" : "value2" }

PUT /my_index/_settings
{
  "refresh_interval" : "30s",
  "number_of_replicas" : 0
}
💻

Example

This example shows how to bulk index documents and adjust settings to improve indexing speed.

json
POST /_bulk
{ "index": {"_index": "products", "_id": "1"} }
{ "name": "Laptop", "price": 1200 }
{ "index": {"_index": "products", "_id": "2"} }
{ "name": "Phone", "price": 800 }

PUT /products/_settings
{
  "refresh_interval": "60s",
  "number_of_replicas": 0
}
Output
{ "took": 30, "errors": false, "items": [ {"index": {"_index": "products", "_id": "1", "status": 201}}, {"index": {"_index": "products", "_id": "2", "status": 201}} ] }
⚠️

Common Pitfalls

Common mistakes that hurt indexing performance include:

  • Indexing documents one by one instead of using the bulk API.
  • Keeping refresh_interval too low (default 1s) during heavy indexing, causing frequent costly refreshes.
  • Having replicas enabled during bulk indexing, which duplicates work.
  • Using dynamic mappings that create many fields on the fly, increasing overhead.
json
### Wrong: Single document indexing
POST /my_index/_doc
{ "field": "value" }

### Right: Bulk indexing
POST /_bulk
{ "index": {"_index": "my_index"} }
{ "field": "value1" }
{ "index": {"_index": "my_index"} }
{ "field": "value2" }
📊

Quick Reference

  • Use bulk API: Batch multiple documents per request.
  • Increase refresh_interval: Set to 30s or more during indexing.
  • Set replicas to 0: Disable replicas temporarily.
  • Optimize mappings: Define explicit fields, avoid dynamic mapping.
  • Disable unnecessary indexing features: Like _source if not needed.

Key Takeaways

Use the bulk API to reduce overhead and speed up indexing.
Increase the refresh_interval during heavy indexing to reduce costly refreshes.
Temporarily disable replicas to avoid duplicate indexing work.
Define explicit mappings to avoid expensive dynamic field creation.
Disable unnecessary features like _source if you don't need them.