ElasticsearchHow-ToBeginner · 4 min read

How to Use Scroll API in Elasticsearch for Large Data Retrieval

Use the _search endpoint with a scroll parameter to start scrolling, then use the _search/scroll endpoint with the returned scroll_id to fetch subsequent batches. This lets you retrieve large result sets efficiently without timing out.

📐

Syntax

The Scroll API uses two main steps: first, initiate a search with a scroll parameter to keep the search context alive, then repeatedly call the _search/scroll endpoint with the scroll_id to get the next batch of results.

Initial search: POST /index/_search?scroll=1m with a query and size.
Scroll request: POST /_search/scroll with JSON body containing scroll and scroll_id.

json

POST /my_index/_search?scroll=1m
{
  "size": 100,
  "query": {
    "match_all": {}
  }
}

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}

💻

Example

This example shows how to retrieve all documents from an index named my_index in batches of 2 using the Scroll API.

json

POST /my_index/_search?scroll=1m
{
  "size": 2,
  "query": {
    "match_all": {}
  }
}

# Response contains hits and a scroll_id

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}

# Repeat scroll requests until hits are empty

Output

{ "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA...", "hits": { "hits": [ {"_id": "1", "_source": {"field": "value1"}}, {"_id": "2", "_source": {"field": "value2"}} ] } } # Next scroll call returns next batch or empty hits when done

⚠️

Common Pitfalls

Not specifying the scroll parameter in the initial search causes the scroll context to close immediately.
Using an expired scroll_id results in errors; keep the scroll alive by requesting within the specified time.
Not clearing the scroll context after finishing wastes resources; use DELETE /_search/scroll to clear.
Setting too large size can cause memory issues; use reasonable batch sizes.

json

### Wrong: Missing scroll parameter
POST /my_index/_search
{
  "size": 100,
  "query": { "match_all": {} }
}

### Right: Include scroll parameter
POST /my_index/_search?scroll=1m
{
  "size": 100,
  "query": { "match_all": {} }
}

### Clear scroll after use
DELETE /_search/scroll
{
  "scroll_id" : ["DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."]
}

📊

Quick Reference

Action	Endpoint	Description
Start scroll	POST /index/_search?scroll=1m	Begin scrolling with a time to keep context alive
Continue scroll	POST /_search/scroll	Fetch next batch using scroll_id
Clear scroll	DELETE /_search/scroll	Clear scroll context to free resources
Set batch size	size parameter	Number of results per batch

✅

Key Takeaways

Always include the scroll parameter in the initial search to keep the context alive.

Use the scroll_id returned to fetch subsequent batches until no more results remain.

Clear the scroll context after finishing to avoid resource leaks.

Choose a reasonable batch size to balance performance and memory use.

Scroll API is designed for deep pagination and large result sets, not real-time user queries.