Overview - Highlighting matched text

What is it?

Highlighting matched text in Elasticsearch means showing the parts of your search results where the search words appear. It helps users quickly see why a document was found by marking the matching words or phrases. This is done by wrapping the matched text with special tags, like or custom tags, to make them stand out.

Why it matters

Without highlighting, users might struggle to understand why a search result is relevant, especially in long documents. Highlighting improves user experience by making search results clearer and faster to scan. It also helps in debugging and refining search queries by showing exactly what matched.

Where it fits

Before learning highlighting, you should understand basic Elasticsearch search queries and how documents are indexed. After mastering highlighting, you can explore advanced search features like multi-field highlighting, custom tags, and performance tuning.

Mental Model

Core Idea

Highlighting matched text is like shining a spotlight on the exact words in a document that matched your search query.

Think of it like...

Imagine reading a book with a highlighter pen. When you search for a word, highlighting is like the pen marking all the places that word appears, so you can find them easily.

┌───────────────────────────────┐
│ Elasticsearch Search Request   │
├───────────────┬───────────────┤
│ Query         │ Highlight     │
│ (search terms)│ (fields, tags)│
└──────┬────────┴───────┬───────┘
       │                │
       ▼                ▼
┌───────────────┐  ┌───────────────┐
│ Matching Docs │  │ Highlighted   │
│ (raw text)    │  │ Snippets with │
│               │  │ <em> tags     │
└───────────────┘  └───────────────┘

Build-Up - 7 Steps

1

FoundationBasic concept of highlighting

Concept: Highlighting marks the matched words in search results to make them visible.

When you search in Elasticsearch, you can ask it to return snippets of text where your search words appear. These snippets have the matched words wrapped in tags like by default. This helps users see why a document matched.

Result

Search results include text snippets with matched words wrapped in tags.

Understanding that highlighting visually connects search queries to results helps users trust and navigate search outputs.

2

FoundationHow to enable highlighting in queries

3

IntermediateCustomizing highlight tags

4

IntermediateHighlighting multiple fields

5

IntermediateFragment size and number control

6

AdvancedUsing 'unified' highlighter for better accuracy

7

ExpertHighlighting with stored fields and performance trade-offs

Under the Hood

Elasticsearch builds an inverted index mapping terms to documents. When a search query runs, it finds matching documents and terms. The highlighter uses this information to locate the exact positions of matched terms in the original text. It then extracts snippets around these positions and wraps matched terms with tags. Different highlighters use different methods: 'plain' scans stored fields, 'fvh' uses term vectors, and 'unified' combines approaches for accuracy and speed.

Why designed this way?

Highlighting was designed to improve user experience by showing context for matches. Early versions used simple methods that were fast but inaccurate. As search needs grew complex, Elasticsearch introduced multiple highlighters to balance speed, accuracy, and resource use. The 'unified' highlighter was created to unify the best features and handle complex queries better.

┌───────────────┐
│ Search Query  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Inverted Index│
│ (term → docs) │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Match Positions│
│ in Documents  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Extract Snippets│
│ Around Matches │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Wrap with Tags │
│ (<em>, etc.)   │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does highlighting always return the full field text with matches highlighted? Commit to yes or no.

Common Belief:Highlighting returns the entire field text with matched words highlighted.

Tap to reveal reality

Quick: Do you think you can highlight fields that are not stored or indexed? Commit to yes or no.

Common Belief:You can highlight any field regardless of how it is stored or indexed.

Tap to reveal reality

Quick: Is the default highlighter always the best choice for all queries? Commit to yes or no.

Common Belief:The default highlighter works perfectly for all types of queries and data.

Tap to reveal reality

Quick: Does changing highlight tags affect search results or only display? Commit to your answer.

Common Belief:Changing highlight tags changes which documents are matched or returned.

Tap to reveal reality

Expert Zone

1

Highlighting performance can degrade significantly on large text fields without term vectors or stored fields, requiring careful index design.

2

The 'unified' highlighter uses a combination of postings and offsets to balance speed and accuracy, but it may behave differently on analyzed fields with complex tokenization.

3

Customizing fragment size and number can impact user experience and server load; too many or too large fragments slow down responses and overwhelm users.

When NOT to use

Highlighting is not suitable when you need to display entire documents or when performance is critical and highlighting overhead is too high. Alternatives include client-side highlighting or pre-processing documents to mark important terms.

Production Patterns

In production, highlighting is often combined with multi-match queries across several fields, custom tags for UI consistency, and caching strategies to reduce overhead. Large systems may disable highlighting on very large fields or use stored fields selectively to optimize performance.

Connections

Text Search Relevance Scoring

Highlighting builds on the same term matching logic used in relevance scoring.

Understanding how Elasticsearch scores documents helps explain why certain terms are highlighted and how relevance and highlighting are linked.

User Interface Design

Highlighting directly affects how search results are presented to users.

Knowing UI principles helps design highlight tags and snippet sizes that improve readability and accessibility.

Compiler Syntax Highlighting

Both highlight matched tokens in text to improve understanding.

Recognizing that highlighting in search and code editors share the goal of emphasizing important text helps appreciate the challenges of accurate token detection.

Common Pitfalls

#1Expecting full field text in highlights and setting fragment_size too large.

Wrong approach:{ "highlight": { "fields": { "content": { "fragment_size": 10000 } } } }

Correct approach:{ "highlight": { "fields": { "content": { "fragment_size": 150, "number_of_fragments": 3 } } } }

Root cause:Misunderstanding that highlighting returns snippets, not full text, leading to performance issues and unwieldy results.

#2Trying to highlight a field that is not indexed or stored.

Wrong approach:{ "highlight": { "fields": { "non_indexed_field": {} } } }

Correct approach:Ensure the field is indexed and optionally stored or has term vectors before highlighting: { "mappings": { "properties": { "field": { "type": "text", "store": true } } }, "highlight": { "fields": { "field": {} } } }

Root cause:Not knowing that highlighting depends on indexed or stored data to extract matched text.

#3Using default highlighter for complex queries without testing accuracy.

Wrong approach:{ "highlight": { "fields": { "content": {} } } }

Correct approach:{ "highlight": { "type": "unified", "fields": { "content": {} } } }

Root cause:Assuming default settings are optimal for all queries, ignoring query complexity and field analysis.

Key Takeaways

Highlighting matched text helps users quickly see why search results are relevant by marking matched words in snippets.

You enable highlighting by adding a 'highlight' section in your Elasticsearch query specifying fields and options.

Customizing tags, fragment size, and number of fragments improves user experience and fits your application's style.

Different highlighters exist; the 'unified' highlighter is the modern choice for better accuracy and performance.

Highlighting depends on how fields are indexed and stored; understanding these trade-offs is key for production use.