0
0
Elasticsearchquery~5 mins

Scroll API for deep pagination in Elasticsearch

Choose your learning style9 modes available
Introduction

The Scroll API helps you get many search results in small parts, so your app doesn't slow down or crash.

You want to get thousands of search results without waiting a long time.
You need to process all matching items step by step, like reading pages in a book.
You want to export large data sets from Elasticsearch safely.
You have a web app that shows search results page by page beyond the first few pages.
You want to avoid heavy memory use by loading all results at once.
Syntax
Elasticsearch
POST /index_name/_search?scroll=1m
{
  "size": 100,
  "query": {
    "match_all": {}
  }
}

scroll=1m means keep the search context alive for 1 minute.

size controls how many results you get per batch.

Examples
Start scrolling to get 50 book products at a time, keeping the context for 2 minutes.
Elasticsearch
POST /products/_search?scroll=2m
{
  "size": 50,
  "query": {
    "match": { "category": "books" }
  }
}
Use this request to get the next batch of results using the scroll ID.
Elasticsearch
POST /_search/scroll
{
  "scroll": "2m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAA..."
}
Sample Program

This program gets search results from Elasticsearch in two batches using the Scroll API. It first requests 3 results, then uses the scroll ID to get the next 3.

Elasticsearch
import requests

# Step 1: Start scroll search
url = 'http://localhost:9200/products/_search?scroll=1m'
query = {
    "size": 3,
    "query": {"match_all": {}}
}
response = requests.post(url, json=query)
data = response.json()
scroll_id = data['_scroll_id']
hits = data['hits']['hits']

print('First batch:')
for hit in hits:
    print(hit['_source'])

# Step 2: Get next batch
scroll_url = 'http://localhost:9200/_search/scroll'
scroll_query = {
    "scroll": "1m",
    "scroll_id": scroll_id
}
response2 = requests.post(scroll_url, json=scroll_query)
data2 = response2.json()
hits2 = data2['hits']['hits']

print('\nSecond batch:')
for hit in hits2:
    print(hit['_source'])
OutputSuccess
Important Notes

Always keep the scroll alive time (like 1m) long enough to fetch all batches.

Scroll API is not for real-time user searches but for deep data processing.

Remember to clear scroll contexts if you stop early to save resources.

Summary

Scroll API helps get large search results in small parts safely.

Use scroll ID to fetch next batches step by step.

Keep scroll context alive with the scroll parameter.