0
0
ElasticsearchHow-ToBeginner · 4 min read

How to Use Scroll API in Elasticsearch for Large Data Retrieval

Use the _search endpoint with a scroll parameter to start scrolling, then use the _search/scroll endpoint with the returned scroll_id to fetch subsequent batches. This lets you retrieve large result sets efficiently without timing out.
📐

Syntax

The Scroll API uses two main steps: first, initiate a search with a scroll parameter to keep the search context alive, then repeatedly call the _search/scroll endpoint with the scroll_id to get the next batch of results.

  • Initial search: POST /index/_search?scroll=1m with a query and size.
  • Scroll request: POST /_search/scroll with JSON body containing scroll and scroll_id.
json
POST /my_index/_search?scroll=1m
{
  "size": 100,
  "query": {
    "match_all": {}
  }
}

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}
💻

Example

This example shows how to retrieve all documents from an index named my_index in batches of 2 using the Scroll API.

json
POST /my_index/_search?scroll=1m
{
  "size": 2,
  "query": {
    "match_all": {}
  }
}

# Response contains hits and a scroll_id

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}

# Repeat scroll requests until hits are empty
Output
{ "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA...", "hits": { "hits": [ {"_id": "1", "_source": {"field": "value1"}}, {"_id": "2", "_source": {"field": "value2"}} ] } } # Next scroll call returns next batch or empty hits when done
⚠️

Common Pitfalls

  • Not specifying the scroll parameter in the initial search causes the scroll context to close immediately.
  • Using an expired scroll_id results in errors; keep the scroll alive by requesting within the specified time.
  • Not clearing the scroll context after finishing wastes resources; use DELETE /_search/scroll to clear.
  • Setting too large size can cause memory issues; use reasonable batch sizes.
json
### Wrong: Missing scroll parameter
POST /my_index/_search
{
  "size": 100,
  "query": { "match_all": {} }
}

### Right: Include scroll parameter
POST /my_index/_search?scroll=1m
{
  "size": 100,
  "query": { "match_all": {} }
}

### Clear scroll after use
DELETE /_search/scroll
{
  "scroll_id" : ["DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."]
}
📊

Quick Reference

ActionEndpointDescription
Start scrollPOST /index/_search?scroll=1mBegin scrolling with a time to keep context alive
Continue scrollPOST /_search/scrollFetch next batch using scroll_id
Clear scrollDELETE /_search/scrollClear scroll context to free resources
Set batch sizesize parameterNumber of results per batch

Key Takeaways

Always include the scroll parameter in the initial search to keep the context alive.
Use the scroll_id returned to fetch subsequent batches until no more results remain.
Clear the scroll context after finishing to avoid resource leaks.
Choose a reasonable batch size to balance performance and memory use.
Scroll API is designed for deep pagination and large result sets, not real-time user queries.