Scroll API for deep pagination in Elasticsearch - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When using Elasticsearch's Scroll API, we want to understand how the time to get results changes as we ask for more data.
We ask: How does the work grow when we scroll through many pages of results?
Analyze the time complexity of the following Scroll API usage.
POST /my_index/_search?scroll=1m
{
"size": 100,
"query": { "match_all": {} }
}
POST /_search/scroll
{
"scroll": "1m",
"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}
This code first requests the first 100 results and then uses the scroll ID to fetch the next batches of results repeatedly.
Look at what repeats when scrolling through results.
- Primary operation: Each scroll request fetches a batch of results from the server.
- How many times: The scroll request repeats once per batch until all results are retrieved.
As you ask for more results, the number of scroll requests grows roughly in proportion to the total results divided by batch size.
| Input Size (total results) | Approx. Operations (scroll requests) |
|---|---|
| 10 | 1 (one batch) |
| 100 | 1 (one batch) |
| 1000 | 10 (ten batches) |
Pattern observation: The number of operations grows linearly with the total number of results requested.
Time Complexity: O(n)
This means the time to get all results grows directly in proportion to how many results you want.
[X] Wrong: "Scrolling through results is a single fast operation regardless of result size."
[OK] Correct: Each scroll request fetches a batch, so more results mean more requests and more time.
Understanding how scrolling scales helps you explain how to handle large data sets efficiently in real projects.
What if we increased the batch size in each scroll request? How would the time complexity change?
Practice
Scroll API in Elasticsearch?Solution
Step 1: Understand Scroll API usage
The Scroll API is designed to handle large result sets by breaking them into smaller parts.Step 2: Compare options with Scroll API purpose
Options B, C, and D relate to other Elasticsearch features, not scrolling.Final Answer:
To retrieve large sets of search results in small, manageable batches. -> Option AQuick Check:
Scroll API = batch retrieval [OK]
- Confusing Scroll API with bulk update operations
- Thinking Scroll API deletes or creates indices
- Assuming Scroll API returns all results at once
Solution
Step 1: Identify scroll search syntax
Starting a scroll requires a query, a scroll time, and size for batch size.Step 2: Analyze options
{"query": {"match_all": {}}, "scroll": "1m", "size": 100} includes query, scroll time, and size correctly. {"scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA", "size": 100} uses scroll_id which is for continuing scroll, not starting. {"query": {"match": {"field": "value"}}, "timeout": "1m"} lacks scroll parameter. {"scroll": "1m", "update": true} has invalid update field.Final Answer:
{"query": {"match_all": {}}, "scroll": "1m", "size": 100} -> Option BQuick Check:
Start scroll = query + scroll + size [OK]
- Using scroll_id to start scroll instead of continue
- Omitting the scroll parameter
- Confusing scroll with timeout or update
{
"_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA",
"hits": {"hits": [{"_id": "1"}, {"_id": "2"}]}
}Solution
Step 1: Understand scroll continuation
To get next batch, use the scroll_id from previous response with scroll parameter.Step 2: Evaluate options
Use the scroll_id in a subsequent scroll request with the scroll parameter. correctly describes using scroll_id and scroll to continue. Send a new search request without scroll_id. restarts search, losing context. Delete the scroll_id to reset the scroll context. is incorrect as deleting scroll_id is not valid. Use the hits array to manually fetch documents by ID. is manual and inefficient.Final Answer:
Use the scroll_id in a subsequent scroll request with the scroll parameter. -> Option CQuick Check:
Next scroll = scroll_id + scroll [OK]
- Restarting search instead of continuing scroll
- Ignoring scroll parameter in next request
- Trying to fetch documents manually by ID
{"scroll_id": "abc123"}. What is the likely cause?Solution
Step 1: Check scroll request requirements
When continuing a scroll, the scroll parameter (time) must be included to keep context alive.Step 2: Analyze error cause
Missing thescrollparameter to keep the scroll context alive. correctly identifies missing scroll parameter. Thescroll_idis invalid and must be a number. is wrong; scroll_id is a string. You cannot usescroll_idin a scroll request. is false; scroll_id is needed. Thesizeparameter is required withscroll_id. is incorrect; size is not required in scroll continuation.Final Answer:
Missing the scroll parameter to keep the scroll context alive. -> Option AQuick Check:
Scroll continuation needs scroll param [OK]
- Omitting scroll parameter in scroll continuation
- Assuming scroll_id must be numeric
- Thinking size is needed every scroll request
Solution
Step 1: Understand deep pagination with Scroll API
Scroll API is designed to fetch large results in small batches with a scroll timeout to keep context alive.Step 2: Evaluate options for best practice
Use a scroll time of 1 minute and fetch batches of 100 documents repeatedly until no hits remain. correctly uses scroll time and batch size to safely retrieve all documents. Set size to 10,000 in a single search request without scrolling. risks memory overload. Use the Scroll API but do not specify the scroll parameter to speed up retrieval. is invalid because scroll param is required. Fetch documents by IDs one by one using separate queries. is inefficient and slow.Final Answer:
Use a scroll time of 1 minute and fetch batches of 100 documents repeatedly until no hits remain. -> Option DQuick Check:
Scroll API + batch + scroll time = safe deep pagination [OK]
- Requesting all documents at once causing memory errors
- Omitting scroll parameter to speed up
- Fetching documents individually instead of batches
