0
0
Elasticsearchquery~15 mins

Percolate queries (reverse search) in Elasticsearch - Deep Dive

Choose your learning style9 modes available
Overview - Percolate queries (reverse search)
What is it?
Percolate queries in Elasticsearch let you register queries first and then check which of those queries match a new document. Instead of searching documents with a query, you search queries with a document. This reverse search helps find all queries interested in a given piece of data. It is useful for alerting, notifications, and matching new data against saved criteria.
Why it matters
Without percolate queries, you would have to run every saved query against new data manually, which is slow and inefficient. Percolate queries solve this by indexing queries and quickly finding matches when new documents arrive. This saves time and resources, enabling real-time matching and alerting in applications like monitoring, recommendation, and security.
Where it fits
Before learning percolate queries, you should understand basic Elasticsearch concepts like indexing, documents, and standard queries. After mastering percolate queries, you can explore advanced alerting systems, real-time data processing, and integrating Elasticsearch with event-driven architectures.
Mental Model
Core Idea
Percolate queries flip the usual search: instead of searching documents with queries, you search queries with documents.
Think of it like...
Imagine a job board where instead of searching for jobs with your resume, the board searches all saved resumes to find which ones match a new job posting.
┌───────────────┐       ┌───────────────┐
│ Registered    │       │ New Document  │
│ Queries       │◄──────│ (to match)    │
└───────────────┘       └───────────────┘
        ▲                      │
        │                      ▼
   Percolate Engine  ──────► Matches Queries
        │                      │
        └───────────────► Output: List of matching queries
Build-Up - 7 Steps
1
FoundationBasic Elasticsearch Query Concept
🤔
Concept: Understand how Elasticsearch normally searches documents using queries.
In Elasticsearch, you store documents in an index. To find documents, you write queries that describe what you want. For example, a query might look for documents where the 'title' contains 'database'. Elasticsearch returns documents matching that query.
Result
You get a list of documents matching your search criteria.
Knowing how normal queries work is essential because percolate queries reverse this process.
2
FoundationWhat is a Percolate Query?
🤔
Concept: Learn that percolate queries store queries and match them against new documents.
Instead of searching documents with a query, you register queries in a special index. When a new document arrives, you ask: which registered queries match this document? Elasticsearch returns all queries that match the document.
Result
You get a list of queries that would find this document if it were searched normally.
This reversal allows efficient matching of new data against many saved queries.
3
IntermediateSetting Up a Percolator Index
🤔
Concept: Learn how to create an index that stores queries as documents.
You create an index with a special mapping that includes a 'percolator' field type. This field stores queries instead of normal data. For example, you define a field 'query' of type 'percolator' and then index documents where 'query' holds the actual query DSL.
Result
An index ready to store queries that can be matched against documents.
Understanding the special mapping is key to using percolate queries correctly.
4
IntermediateRegistering Queries in the Percolator
🤔
Concept: Learn how to add queries to the percolator index as documents.
You add documents to the percolator index where each document contains a 'query' field with the query DSL. For example, a document might have a 'query' that matches documents with 'status' equal to 'error'. These stored queries are what get matched later.
Result
A collection of saved queries stored as documents in the percolator index.
Storing queries as documents allows Elasticsearch to index and search them efficiently.
5
IntermediateRunning a Percolate Query
🤔Before reading on: do you think you send a query or a document to find matches in percolate? Commit to your answer.
Concept: Learn how to send a document to find which stored queries match it.
You run a percolate query by sending a document to Elasticsearch. Elasticsearch checks this document against all stored queries in the percolator index and returns the IDs of queries that match. This is done using the 'percolate' query type in the search API.
Result
A list of query IDs that match the given document.
Knowing that you send a document to find matching queries flips the usual search mindset.
6
AdvancedPerformance Considerations and Scaling
🤔Before reading on: do you think percolate queries scale well with thousands or millions of stored queries? Commit to your answer.
Concept: Understand how Elasticsearch optimizes percolate queries and what limits exist.
Elasticsearch uses inverted indexes and caching to speed up percolate queries. However, very large numbers of stored queries can slow down matching. Techniques like query filtering, sharding, and limiting query complexity help maintain performance.
Result
Efficient matching even with many stored queries, but requires careful design.
Understanding performance helps design scalable real-time matching systems.
7
ExpertAdvanced Use Cases and Internals
🤔Before reading on: do you think percolate queries can be combined with other query types or scripted queries? Commit to your answer.
Concept: Explore combining percolate queries with complex queries and how Elasticsearch processes them internally.
Percolate queries can be combined with filters, bool queries, and scripts for advanced matching logic. Internally, Elasticsearch rewrites stored queries into a form that can be efficiently matched against the incoming document. Understanding this helps optimize query design and troubleshoot issues.
Result
Powerful, flexible matching capabilities beyond simple query storage.
Knowing the internals and flexibility unlocks expert-level use and optimization.
Under the Hood
Elasticsearch indexes stored queries using a special 'percolator' field type. When a new document is percolated, Elasticsearch rewrites the stored queries into a form that can be matched against the document's fields. It uses inverted indexes and caching to quickly find which queries match the document's content.
Why designed this way?
Percolate queries were designed to invert the search process for efficiency in real-time matching scenarios. Traditional search indexes documents for queries; percolate indexes queries for documents. This design allows fast matching of many queries against incoming data, which is essential for alerting and notification systems.
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│ Stored Queries │─────▶│ Query Index   │─────▶│ Matching      │
│ (percolator)  │      │ (inverted)    │      │ Engine        │
└───────────────┘      └───────────────┘      └───────────────┘
                                                   ▲
                                                   │
                                           ┌───────────────┐
                                           │ New Document  │
                                           └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think percolate queries search documents with queries like normal? Commit yes or no.
Common Belief:Percolate queries work like normal queries but just with a different name.
Tap to reveal reality
Reality:Percolate queries reverse the process: they search stored queries using a new document, not documents using a query.
Why it matters:Confusing this leads to wrong implementation and inefficient systems that do not leverage percolate's power.
Quick: Do you think you can store any kind of query in a percolator index? Commit yes or no.
Common Belief:Any Elasticsearch query can be stored and percolated without restrictions.
Tap to reveal reality
Reality:Only certain query types are supported in percolate queries; complex scripts or aggregations are not allowed.
Why it matters:Trying unsupported queries causes errors and wasted development time.
Quick: Do you think percolate queries scale easily to millions of stored queries? Commit yes or no.
Common Belief:Percolate queries scale linearly and can handle millions of stored queries without issue.
Tap to reveal reality
Reality:Performance degrades with very large numbers of stored queries; careful design and optimization are needed.
Why it matters:Ignoring scaling limits can cause slow response times and system failures.
Quick: Do you think percolate queries return the matching documents? Commit yes or no.
Common Belief:Percolate queries return documents that match the stored queries.
Tap to reveal reality
Reality:Percolate queries return the stored queries that match the given document, not documents themselves.
Why it matters:Misunderstanding this leads to incorrect expectations and misuse of the API.
Expert Zone
1
Percolate queries internally rewrite stored queries into a normalized form for efficient matching, which affects how complex queries behave.
2
The percolator field type requires careful mapping and cannot be mixed with normal text fields in the same field name.
3
Combining percolate queries with filters and bool queries allows building complex matching rules that can optimize performance by reducing candidate queries.
When NOT to use
Percolate queries are not suitable when you need to match documents against very complex queries involving scripts or aggregations. In such cases, consider using external processing or custom application logic. Also, if you have very few queries or documents, normal search might be simpler and more efficient.
Production Patterns
In production, percolate queries are used for real-time alerting systems, such as monitoring logs for error patterns, matching user profiles to notifications, or filtering incoming data streams. They are often combined with message queues and event-driven architectures to trigger actions when matches occur.
Connections
Event-driven Architecture
Percolate queries enable real-time matching which triggers events in event-driven systems.
Understanding percolate queries helps design systems that react instantly to new data by firing events based on matching criteria.
Pattern Matching in Functional Programming
Both involve checking data against a set of patterns or rules to find matches.
Knowing how pattern matching works in programming clarifies how percolate queries match documents against stored query patterns.
Reverse Indexing in Information Retrieval
Percolate queries use inverted indexes of queries, similar to how search engines index documents.
Understanding reverse indexing explains how Elasticsearch efficiently finds matching queries for a document.
Common Pitfalls
#1Trying to store normal text fields as 'percolator' type without proper mapping.
Wrong approach:{ "mappings": { "properties": { "query": { "type": "text" } } } }
Correct approach:{ "mappings": { "properties": { "query": { "type": "percolator" } } } }
Root cause:Confusing normal text fields with the special 'percolator' field type needed to store queries.
#2Sending a query instead of a document in the percolate query request.
Wrong approach:{ "query": { "percolate": { "field": "query", "document": { "query": { "match": { "title": "error" } } } } } }
Correct approach:{ "query": { "percolate": { "field": "query", "document": { "title": "error" } } } }
Root cause:Misunderstanding that the percolate query expects a document to match against stored queries, not another query.
#3Storing unsupported query types like aggregations in the percolator index.
Wrong approach:{ "query": { "aggs": { "avg_price": { "avg": { "field": "price" } } } } }
Correct approach:{ "query": { "match": { "status": "error" } } }
Root cause:Not knowing that percolate queries only support query types that can be matched against documents.
Key Takeaways
Percolate queries reverse the usual search process by matching stored queries against new documents.
They require a special index mapping with a 'percolator' field type to store queries as documents.
You send a document to the percolate query to find which stored queries match it, enabling real-time alerting and notifications.
Performance depends on the number and complexity of stored queries, so design and optimization are important for scaling.
Understanding percolate queries unlocks powerful use cases in monitoring, recommendation, and event-driven systems.