0
0
ElasticsearchHow-ToBeginner · 4 min read

How to Bulk Index Documents in Elasticsearch Quickly

To bulk index documents in Elasticsearch, use the _bulk API endpoint with a newline-delimited JSON payload containing action and data pairs. Each document requires an index action line followed by the document data line, allowing efficient insertion of many documents in one request.
📐

Syntax

The bulk API requires a newline-delimited JSON (NDJSON) format where each action line specifies the operation (like index) and the next line contains the document data. This pattern repeats for each document.

  • Action line: Specifies the operation and target index.
  • Data line: Contains the JSON document to be indexed.
json
{ "index" : { "_index" : "my_index", "_id" : "1" } }
{ "field1" : "value1", "field2" : "value2" }
{ "index" : { "_index" : "my_index", "_id" : "2" } }
{ "field1" : "value3", "field2" : "value4" }
💻

Example

This example shows how to bulk index two documents into an index named my_index using a curl command. It demonstrates the required NDJSON format and the response from Elasticsearch.

bash
curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/x-ndjson' -d '
{ "index" : { "_index" : "my_index", "_id" : "1" } }
{ "name" : "Alice", "age" : 30 }
{ "index" : { "_index" : "my_index", "_id" : "2" } }
{ "name" : "Bob", "age" : 25 }
'
Output
{ "took" : 5, "errors" : false, "items" : [ { "index" : { "_index" : "my_index", "_id" : "1", "status" : 201 } }, { "index" : { "_index" : "my_index", "_id" : "2", "status" : 201 } } ] }
⚠️

Common Pitfalls

Common mistakes when bulk indexing include:

  • Not using newline-delimited JSON format correctly, causing parsing errors.
  • Missing the action line before each document.
  • Sending the entire bulk payload as a single JSON array instead of NDJSON.
  • Not setting the Content-Type header to application/x-ndjson.

Always ensure each action line is followed by its document line, and the payload ends with a newline.

json
Wrong way:
[
  { "index" : { "_index" : "my_index", "_id" : "1" } },
  { "name" : "Alice", "age" : 30 }
]

Right way:
{ "index" : { "_index" : "my_index", "_id" : "1" } }
{ "name" : "Alice", "age" : 30 }
📊

Quick Reference

Tips for bulk indexing:

  • Use the _bulk API endpoint.
  • Format data as newline-delimited JSON (NDJSON).
  • Each document requires an action line and a data line.
  • Set Content-Type to application/x-ndjson.
  • Check the response for errors after bulk indexing.

Key Takeaways

Use the Elasticsearch _bulk API with newline-delimited JSON to index multiple documents efficiently.
Each document must be preceded by an action line specifying the index and optional document ID.
Always set the Content-Type header to application/x-ndjson when sending bulk requests.
Check the bulk API response for errors to ensure all documents indexed successfully.
Avoid sending bulk data as a JSON array; use NDJSON format instead.