0
0
Elasticsearchquery~10 mins

Scroll API for deep pagination in Elasticsearch - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Scroll API for deep pagination
Start Scroll Search
Receive initial batch of results + scroll_id
Use scroll_id to request next batch
Receive next batch + updated scroll_id
No more results?
YesEnd Scroll
Back to Use scroll_id
The Scroll API starts a search and returns a batch of results with a scroll ID. You use this ID to fetch the next batch repeatedly until no results remain.
Execution Sample
Elasticsearch
POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}
This request uses the scroll ID to get the next batch of search results within the scroll context.
Execution Table
StepActionInputOutputscroll_idNotes
1Start scroll search{"scroll":"1m","size":2,"query":{"match_all":{}}}Batch 1 results (2 items)scroll_id_1Initial search returns first 2 results and scroll_id_1
2Request next batch{"scroll":"1m","scroll_id":"scroll_id_1"}Batch 2 results (2 items)scroll_id_2Use scroll_id_1 to get next 2 results and new scroll_id_2
3Request next batch{"scroll":"1m","scroll_id":"scroll_id_2"}Batch 3 results (1 item)scroll_id_3Next batch has 1 result, scroll_id_3 returned
4Request next batch{"scroll":"1m","scroll_id":"scroll_id_3"}No resultsnullNo more results, scroll ends
💡 No more results returned, scroll_id is null, scroll session ends
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4
scroll_idnullscroll_id_1scroll_id_2scroll_id_3null
results_count02210
Key Moments - 3 Insights
Why do we need to keep using the scroll_id for each next request?
Each scroll_id represents the current position in the search results. Using it tells Elasticsearch where to continue fetching results, as shown in execution_table steps 2 and 3.
What happens when the scroll returns no results?
When no results are returned (step 4), it means we reached the end of the data. The scroll session ends and no further requests are needed.
Why do we specify a scroll time like "1m" in each request?
The scroll time keeps the search context alive on the server for that duration. Each request refreshes this timer to avoid losing the scroll session, as seen in all requests.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the scroll_id after step 2?
Ascroll_id_1
Bscroll_id_3
Cscroll_id_2
Dnull
💡 Hint
Check the scroll_id column in execution_table row for step 2
At which step does the scroll API return no results?
AStep 4
BStep 1
CStep 3
DStep 2
💡 Hint
Look at the Output column in execution_table for the step with 'No results'
If the scroll time was not specified in the requests, what would happen?
AScroll session would stay open indefinitely
BScroll session might expire too soon and cause errors
CScroll_id would change automatically
DResults would be returned faster
💡 Hint
Refer to key_moments about the importance of the scroll time parameter
Concept Snapshot
Scroll API lets you fetch large search results in batches.
Start with a search request specifying scroll time and size.
Get a scroll_id with each batch to request the next batch.
Repeat until no results remain.
Always include scroll time to keep the session alive.
Full Transcript
The Scroll API in Elasticsearch helps you get large sets of search results in small pieces. First, you send a search request with a scroll time and batch size. Elasticsearch returns the first batch of results and a scroll_id. You then use this scroll_id in the next request to get the next batch. This process repeats, each time using the new scroll_id returned, until no more results come back. The scroll time keeps the search context alive on the server. When no results are returned, the scroll session ends. This method is useful for deep pagination where normal paging is inefficient.