How to Use Scroll API in Elasticsearch for Large Data Retrieval
Use the
_search endpoint with a scroll parameter to start scrolling, then use the _search/scroll endpoint with the returned scroll_id to fetch subsequent batches. This lets you retrieve large result sets efficiently without timing out.Syntax
The Scroll API uses two main steps: first, initiate a search with a scroll parameter to keep the search context alive, then repeatedly call the _search/scroll endpoint with the scroll_id to get the next batch of results.
- Initial search:
POST /index/_search?scroll=1mwith a query and size. - Scroll request:
POST /_search/scrollwith JSON body containingscrollandscroll_id.
json
POST /my_index/_search?scroll=1m { "size": 100, "query": { "match_all": {} } } POST /_search/scroll { "scroll": "1m", "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..." }
Example
This example shows how to retrieve all documents from an index named my_index in batches of 2 using the Scroll API.
json
POST /my_index/_search?scroll=1m { "size": 2, "query": { "match_all": {} } } # Response contains hits and a scroll_id POST /_search/scroll { "scroll": "1m", "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..." } # Repeat scroll requests until hits are empty
Output
{
"_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA...",
"hits": {
"hits": [
{"_id": "1", "_source": {"field": "value1"}},
{"_id": "2", "_source": {"field": "value2"}}
]
}
}
# Next scroll call returns next batch or empty hits when done
Common Pitfalls
- Not specifying the
scrollparameter in the initial search causes the scroll context to close immediately. - Using an expired
scroll_idresults in errors; keep the scroll alive by requesting within the specified time. - Not clearing the scroll context after finishing wastes resources; use
DELETE /_search/scrollto clear. - Setting too large
sizecan cause memory issues; use reasonable batch sizes.
json
### Wrong: Missing scroll parameter
POST /my_index/_search
{
"size": 100,
"query": { "match_all": {} }
}
### Right: Include scroll parameter
POST /my_index/_search?scroll=1m
{
"size": 100,
"query": { "match_all": {} }
}
### Clear scroll after use
DELETE /_search/scroll
{
"scroll_id" : ["DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."]
}Quick Reference
| Action | Endpoint | Description |
|---|---|---|
| Start scroll | POST /index/_search?scroll=1m | Begin scrolling with a time to keep context alive |
| Continue scroll | POST /_search/scroll | Fetch next batch using scroll_id |
| Clear scroll | DELETE /_search/scroll | Clear scroll context to free resources |
| Set batch size | size parameter | Number of results per batch |
Key Takeaways
Always include the scroll parameter in the initial search to keep the context alive.
Use the scroll_id returned to fetch subsequent batches until no more results remain.
Clear the scroll context after finishing to avoid resource leaks.
Choose a reasonable batch size to balance performance and memory use.
Scroll API is designed for deep pagination and large result sets, not real-time user queries.