0
0
Elasticsearchquery~15 mins

Highlighting matched text in Elasticsearch - Deep Dive

Choose your learning style9 modes available
Overview - Highlighting matched text
What is it?
Highlighting matched text in Elasticsearch means showing the parts of your search results where the search words appear. It helps users quickly see why a document was found by marking the matching words or phrases. This is done by wrapping the matched text with special tags, like or custom tags, to make them stand out.
Why it matters
Without highlighting, users might struggle to understand why a search result is relevant, especially in long documents. Highlighting improves user experience by making search results clearer and faster to scan. It also helps in debugging and refining search queries by showing exactly what matched.
Where it fits
Before learning highlighting, you should understand basic Elasticsearch search queries and how documents are indexed. After mastering highlighting, you can explore advanced search features like multi-field highlighting, custom tags, and performance tuning.
Mental Model
Core Idea
Highlighting matched text is like shining a spotlight on the exact words in a document that matched your search query.
Think of it like...
Imagine reading a book with a highlighter pen. When you search for a word, highlighting is like the pen marking all the places that word appears, so you can find them easily.
┌───────────────────────────────┐
│ Elasticsearch Search Request   │
├───────────────┬───────────────┤
│ Query         │ Highlight     │
│ (search terms)│ (fields, tags)│
└──────┬────────┴───────┬───────┘
       │                │
       ▼                ▼
┌───────────────┐  ┌───────────────┐
│ Matching Docs │  │ Highlighted   │
│ (raw text)    │  │ Snippets with │
│               │  │ <em> tags     │
└───────────────┘  └───────────────┘
Build-Up - 7 Steps
1
FoundationBasic concept of highlighting
🤔
Concept: Highlighting marks the matched words in search results to make them visible.
When you search in Elasticsearch, you can ask it to return snippets of text where your search words appear. These snippets have the matched words wrapped in tags like by default. This helps users see why a document matched.
Result
Search results include text snippets with matched words wrapped in tags.
Understanding that highlighting visually connects search queries to results helps users trust and navigate search outputs.
2
FoundationHow to enable highlighting in queries
🤔
Concept: Highlighting is enabled by adding a 'highlight' section to your search query specifying which fields to highlight.
In your Elasticsearch query JSON, add a 'highlight' object. Inside it, specify the fields you want to highlight. For example: { "query": { "match": { "content": "apple" } }, "highlight": { "fields": { "content": {} } } } This tells Elasticsearch to highlight matches in the 'content' field.
Result
The search response includes a 'highlight' section with snippets showing matched text with tags.
Knowing how to add highlighting to queries is the first step to making search results more user-friendly.
3
IntermediateCustomizing highlight tags
🤔Before reading on: do you think you can change the tags used for highlighting, or are they fixed to ? Commit to your answer.
Concept: You can customize the tags used to wrap matched text to fit your application's style.
By default, Elasticsearch uses and tags. You can change these by setting 'pre_tags' and 'post_tags' in the highlight section. For example: "highlight": { "pre_tags": [""], "post_tags": [""], "fields": { "content": {} } } This will wrap matched text in tags instead.
Result
Highlighted snippets use tags instead of tags around matched words.
Custom tags allow integration of highlighting with different UI styles and accessibility needs.
4
IntermediateHighlighting multiple fields
🤔Before reading on: do you think highlighting works only on one field at a time or can it handle many fields? Commit to your answer.
Concept: Elasticsearch can highlight matches in multiple fields within the same query.
You can specify multiple fields in the 'highlight' section like this: "highlight": { "fields": { "title": {}, "content": {} } } This will return highlighted snippets for both 'title' and 'content' fields if matches exist.
Result
Search results include highlights for all specified fields where matches occur.
Highlighting multiple fields helps users see relevant matches across different parts of documents.
5
IntermediateFragment size and number control
🤔Before reading on: do you think Elasticsearch returns the entire field text highlighted or just parts? Commit to your answer.
Concept: You can control how much text around the match is shown and how many snippets are returned.
Use 'fragment_size' to set how many characters each snippet contains, and 'number_of_fragments' to set how many snippets per field. For example: "highlight": { "fields": { "content": { "fragment_size": 50, "number_of_fragments": 3 } } } This returns up to 3 snippets of 50 characters each with highlights.
Result
Search results show concise snippets with highlighted matches, not full field text.
Controlling snippet size improves readability and performance by avoiding huge highlighted texts.
6
AdvancedUsing 'unified' highlighter for better accuracy
🤔Before reading on: do you think all highlighters in Elasticsearch behave the same? Commit to your answer.
Concept: Elasticsearch offers different highlighters; 'unified' is the modern default with better accuracy and performance.
You can specify the highlighter type: "highlight": { "type": "unified", "fields": { "content": {} } } The 'unified' highlighter handles complex queries and multi-term matches better than older types like 'plain' or 'fvh'.
Result
Highlighted snippets are more accurate and consistent with complex queries.
Choosing the right highlighter type affects the quality and speed of highlighting in production.
7
ExpertHighlighting with stored fields and performance trade-offs
🤔Before reading on: do you think highlighting always reads from the indexed text or can it use stored fields? Commit to your answer.
Concept: Highlighting can read from stored fields or the original source, affecting performance and accuracy.
By default, highlighting reads from the inverted index, but for some fields, you can store the original text separately and highlight from that. This can improve accuracy but may slow down queries. For example, setting 'store': true on a field allows this. Trade-offs include: - Using stored fields: more accurate highlights but more storage and slower queries. - Using indexed terms: faster but may miss some matches or context. Choosing depends on your application's needs.
Result
Highlighting behavior and performance vary based on field storage settings.
Understanding storage and indexing trade-offs helps optimize highlighting for large-scale, real-world systems.
Under the Hood
Elasticsearch builds an inverted index mapping terms to documents. When a search query runs, it finds matching documents and terms. The highlighter uses this information to locate the exact positions of matched terms in the original text. It then extracts snippets around these positions and wraps matched terms with tags. Different highlighters use different methods: 'plain' scans stored fields, 'fvh' uses term vectors, and 'unified' combines approaches for accuracy and speed.
Why designed this way?
Highlighting was designed to improve user experience by showing context for matches. Early versions used simple methods that were fast but inaccurate. As search needs grew complex, Elasticsearch introduced multiple highlighters to balance speed, accuracy, and resource use. The 'unified' highlighter was created to unify the best features and handle complex queries better.
┌───────────────┐
│ Search Query  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Inverted Index│
│ (term → docs) │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Match Positions│
│ in Documents  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Extract Snippets│
│ Around Matches │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Wrap with Tags │
│ (<em>, etc.)   │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does highlighting always return the full field text with matches highlighted? Commit to yes or no.
Common Belief:Highlighting returns the entire field text with matched words highlighted.
Tap to reveal reality
Reality:Highlighting returns only small snippets around matched terms, not the full field content by default.
Why it matters:Expecting full text can lead to confusion and performance issues when large fields are involved.
Quick: Do you think you can highlight fields that are not stored or indexed? Commit to yes or no.
Common Belief:You can highlight any field regardless of how it is stored or indexed.
Tap to reveal reality
Reality:Highlighting requires fields to be indexed and sometimes stored or have term vectors; otherwise, it cannot extract matched text.
Why it matters:Trying to highlight unsupported fields leads to empty or missing highlights, confusing users.
Quick: Is the default highlighter always the best choice for all queries? Commit to yes or no.
Common Belief:The default highlighter works perfectly for all types of queries and data.
Tap to reveal reality
Reality:Different highlighters perform better depending on query complexity and field settings; 'unified' is best for most but not all cases.
Why it matters:Using the wrong highlighter can cause inaccurate highlights or slow query performance.
Quick: Does changing highlight tags affect search results or only display? Commit to your answer.
Common Belief:Changing highlight tags changes which documents are matched or returned.
Tap to reveal reality
Reality:Highlight tags only affect how matched text is displayed, not which documents match the query.
Why it matters:Misunderstanding this can lead to incorrect debugging or query design.
Expert Zone
1
Highlighting performance can degrade significantly on large text fields without term vectors or stored fields, requiring careful index design.
2
The 'unified' highlighter uses a combination of postings and offsets to balance speed and accuracy, but it may behave differently on analyzed fields with complex tokenization.
3
Customizing fragment size and number can impact user experience and server load; too many or too large fragments slow down responses and overwhelm users.
When NOT to use
Highlighting is not suitable when you need to display entire documents or when performance is critical and highlighting overhead is too high. Alternatives include client-side highlighting or pre-processing documents to mark important terms.
Production Patterns
In production, highlighting is often combined with multi-match queries across several fields, custom tags for UI consistency, and caching strategies to reduce overhead. Large systems may disable highlighting on very large fields or use stored fields selectively to optimize performance.
Connections
Text Search Relevance Scoring
Highlighting builds on the same term matching logic used in relevance scoring.
Understanding how Elasticsearch scores documents helps explain why certain terms are highlighted and how relevance and highlighting are linked.
User Interface Design
Highlighting directly affects how search results are presented to users.
Knowing UI principles helps design highlight tags and snippet sizes that improve readability and accessibility.
Compiler Syntax Highlighting
Both highlight matched tokens in text to improve understanding.
Recognizing that highlighting in search and code editors share the goal of emphasizing important text helps appreciate the challenges of accurate token detection.
Common Pitfalls
#1Expecting full field text in highlights and setting fragment_size too large.
Wrong approach:{ "highlight": { "fields": { "content": { "fragment_size": 10000 } } } }
Correct approach:{ "highlight": { "fields": { "content": { "fragment_size": 150, "number_of_fragments": 3 } } } }
Root cause:Misunderstanding that highlighting returns snippets, not full text, leading to performance issues and unwieldy results.
#2Trying to highlight a field that is not indexed or stored.
Wrong approach:{ "highlight": { "fields": { "non_indexed_field": {} } } }
Correct approach:Ensure the field is indexed and optionally stored or has term vectors before highlighting: { "mappings": { "properties": { "field": { "type": "text", "store": true } } }, "highlight": { "fields": { "field": {} } } }
Root cause:Not knowing that highlighting depends on indexed or stored data to extract matched text.
#3Using default highlighter for complex queries without testing accuracy.
Wrong approach:{ "highlight": { "fields": { "content": {} } } }
Correct approach:{ "highlight": { "type": "unified", "fields": { "content": {} } } }
Root cause:Assuming default settings are optimal for all queries, ignoring query complexity and field analysis.
Key Takeaways
Highlighting matched text helps users quickly see why search results are relevant by marking matched words in snippets.
You enable highlighting by adding a 'highlight' section in your Elasticsearch query specifying fields and options.
Customizing tags, fragment size, and number of fragments improves user experience and fits your application's style.
Different highlighters exist; the 'unified' highlighter is the modern choice for better accuracy and performance.
Highlighting depends on how fields are indexed and stored; understanding these trade-offs is key for production use.