Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Async search for expensive queries
📖 Scenario: You work with a large Elasticsearch database that stores product sales data. Some queries take a long time to run because they analyze a lot of data. To avoid waiting and blocking your application, you want to use Elasticsearch's async search feature. This lets you start a search and check back later for the results.
🎯 Goal: Build a simple async search workflow using Elasticsearch's REST API. You will start an async search for products with sales over a certain amount, then check the status and finally get the results.
📋 What You'll Learn
Create an async search request with a query for products with sales greater than 1000
Store the async search ID returned by Elasticsearch
Use the async search ID to check the status of the search
Retrieve and print the final search results
💡 Why This Matters
🌍 Real World
Async search is useful when queries take a long time on large datasets. It lets applications stay responsive by checking back later for results.
💼 Career
Many data engineer and backend developer roles require working with Elasticsearch and optimizing search queries using async search for better performance.
Progress0 / 4 steps
1
Create async search request
Write a POST request to /products/_async_search with a JSON body that queries for products where sales is greater than 1000. Store the response JSON in a variable called response. Use the Elasticsearch Python client method client.async_search.submit with index "products" and the query body shown below.
Elasticsearch
Hint
Use client.async_search.submit with the correct index and query body to start the async search.
2
Store async search ID
Extract the async search ID from response and store it in a variable called search_id. The ID is in response['id'].
Elasticsearch
Hint
The async search ID is returned in the 'id' field of the response.
3
Check async search status
Use the search_id to get the current status of the async search by calling client.async_search.get with id=search_id. Store the result in a variable called status_response.
Elasticsearch
Hint
Use client.async_search.get with the async search ID to check the status.
4
Print async search results
Print the hits from the async search results stored in status_response. The hits are in status_response['response']['hits']['hits']. Use print() to display them.
Elasticsearch
Hint
Print the list of hits from status_response['response']['hits']['hits']. It may be empty if no products match.
Practice
(1/5)
1. What is the main benefit of using async search in Elasticsearch for expensive queries?
easy
A. It caches all query results permanently.
B. It automatically speeds up the query execution time.
C. It disables query logging to improve performance.
D. It allows running slow queries without blocking the application.
Solution
Step 1: Understand async search purpose
Async search lets you run slow or heavy queries without making your app wait or freeze.
Step 2: Identify the main benefit
This means your app can continue working while the query runs in the background.
Final Answer:
It allows running slow queries without blocking the application. -> Option D
Quick Check:
Async search = non-blocking query execution [OK]
Hint: Async search runs queries in background, so app doesn't wait [OK]
Common Mistakes:
Thinking async search speeds up queries automatically
Assuming async search caches results permanently
Believing async search disables logging
2. Which of the following is the correct way to start an async search request in Elasticsearch using the REST API?
easy
A. POST /_async_search
{
"query": { "match_all": {} }
}
B. GET /_async_search
{
"query": { "match_all": {} }
}
C. POST /_search/async
{
"query": { "match_all": {} }
}
D. PUT /_async_search
{
"query": { "match_all": {} }
}
Solution
Step 1: Recall async search API endpoint
The correct endpoint to start an async search is POST /_async_search with the query in the body.
Step 2: Check HTTP method and path
GET is not used to start async search, and /_search/async or PUT are incorrect paths or methods.
Final Answer:
POST /_async_search with query body -> Option A
Quick Check:
Start async search = POST /_async_search [OK]
Hint: Use POST method on /_async_search to start async search [OK]
Common Mistakes:
Using GET instead of POST to start async search
Using wrong endpoint like /_search/async
Using PUT method which is invalid here
3. Given this async search response snippet, what does the id field represent?
A. Missing comma between query and wait_for_completion_timeout fields.
B. Using POST instead of GET method.
C. wait_for_completion_timeout cannot be set in the request body.
D. The field name "title" is invalid in match query.
Solution
Step 1: Check JSON syntax
The JSON body is missing a comma after the closing brace of the "query" object.
Step 2: Validate method and fields
POST is correct method, wait_for_completion_timeout is valid in body, and "title" is a valid field name.
Final Answer:
Missing comma between query and wait_for_completion_timeout fields. -> Option A
Quick Check:
JSON syntax error = missing comma [OK]
Hint: Check commas between JSON fields carefully [OK]
Common Mistakes:
Forgetting commas between JSON objects
Confusing HTTP methods for async search
Misplacing wait_for_completion_timeout outside body
5. You want to run a very expensive aggregation query on a large dataset without timing out. Which approach using async search is best to get the final results efficiently?
hard
A. Run a normal search with a very high timeout value to wait for results.
B. Start async search with a long wait_for_completion_timeout and poll using the returned id until results are ready.
C. Start async search and immediately request results without waiting for completion.
D. Run the query multiple times with smaller timeouts and merge results manually.
Solution
Step 1: Understand async search timeout and polling
Setting a reasonable wait_for_completion_timeout lets the server try to finish quickly but returns control if it takes longer.
Step 2: Use the returned id to poll for completion
You can check the status later using the id until the results are ready, avoiding timeouts and blocking.
Final Answer:
Start async search with a long wait_for_completion_timeout and poll using the returned id until results are ready. -> Option B
Quick Check:
Async search + polling = efficient for expensive queries [OK]
Hint: Use wait_for_completion_timeout + poll with id for big queries [OK]
Common Mistakes:
Using normal search with high timeout risking app freeze
Requesting results immediately before completion
Manually merging partial results instead of async search