Overview - Enrich processor

What is it?

The Enrich processor in Elasticsearch is a tool used to add extra information to documents as they are ingested. It looks up data from an external source, called an enrich policy, and merges matching data into the document. This helps to enhance or complete the document with related details without storing all data in one place.

Why it matters

Without the Enrich processor, you would need to store all related data inside each document or perform costly joins at query time, which slows down searches. The Enrich processor solves this by enriching documents during ingestion, making searches faster and more efficient. This improves performance and reduces storage duplication.

Where it fits

Before learning about the Enrich processor, you should understand Elasticsearch basics like indexing and ingest pipelines. After mastering it, you can explore advanced data enrichment techniques, such as using scripted processors or integrating with external databases for enrichment.

Mental Model

Core Idea

The Enrich processor adds extra data to documents by looking up matching information from a separate data source during ingestion.

Think of it like...

Imagine mailing a letter and adding a sticker with extra info about the recipient from a separate address book before sending it out.

┌───────────────┐      ┌───────────────┐
│ Incoming Doc  │─────▶│ Enrich Policy │
│ (partial data)│      │ (lookup data) │
└───────────────┘      └───────────────┘
         │                    ▲
         │                    │
         ▼                    │
┌────────────────────────────┐
│ Enriched Document (merged)  │
└────────────────────────────┘

Build-Up - 7 Steps

1

FoundationWhat is an Enrich Policy

Concept: An enrich policy defines the data source and matching rules used to add information to documents.

An enrich policy is created by specifying a source index and a matching field. This policy builds a special index that stores the data to be used for enrichment. For example, a policy might use a customer database to add customer details to logs.

Result

You get a ready-to-use enrich index that the Enrich processor can query during ingestion.

Understanding enrich policies is key because they hold the data that will be merged into your documents, separating enrichment data from your main data.

2

FoundationHow Enrich Processor Works in Pipelines

3

IntermediateConfiguring Enrich Processor Fields

4

IntermediateManaging Enrich Policy Lifecycle

5

AdvancedPerformance Considerations of Enrich Processor

6

ExpertAdvanced Use: Chaining Enrich Processors

7

ExpertInternal Mechanics of Enrich Index Lookup

Under the Hood

The Enrich processor queries a dedicated enrich index built from an enrich policy. This index stores key-value pairs optimized for exact-match lookups. During ingestion, the processor uses the document's matching field to quickly retrieve matching enrich data from this index and merges it into the document before indexing.

Why designed this way?

Elasticsearch designed the Enrich processor to avoid costly runtime joins by pre-building a fast lookup index. This design balances ingestion speed and query performance, allowing enrichment without duplicating data or slowing searches. Alternatives like runtime joins were rejected due to poor scalability.

┌───────────────┐       ┌───────────────────┐       ┌───────────────┐
│ Document In   │──────▶│ Enrich Processor  │──────▶│ Enrich Index  │
│ (with key)    │       │ (lookup & merge)  │       │ (fast lookup) │
└───────────────┘       └───────────────────┘       └───────────────┘
         │                        │                         ▲
         │                        │                         │
         ▼                        ▼                         │
┌────────────────────────────────────────────────────────────┐
│               Enriched Document Indexed                     │
└────────────────────────────────────────────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does the Enrich processor update enrichment data automatically when source data changes? Commit to yes or no.

Common Belief:The Enrich processor automatically updates enrichment data whenever the source index changes.

Tap to reveal reality

Quick: Can the Enrich processor perform fuzzy or partial matches? Commit to yes or no.

Common Belief:The Enrich processor can perform fuzzy or partial matches to enrich documents with approximate data.

Tap to reveal reality

Quick: Does using the Enrich processor eliminate the need for any joins at query time? Commit to yes or no.

Common Belief:Using the Enrich processor completely removes the need for joins during search queries.

Tap to reveal reality

Quick: Is the Enrich processor suitable for very large enrich data sets without performance impact? Commit to yes or no.

Common Belief:The Enrich processor handles very large enrich data sets with no significant performance impact on ingestion.

Tap to reveal reality

Expert Zone

1

Enrich indices are optimized for exact-match lookups but do not support complex queries or aggregations, so enrichment data must be carefully structured.

2

The enrich processor merges data by overwriting or adding fields, so field naming conflicts can cause silent data loss if not managed.

3

Chaining multiple enrich processors can cause unexpected overwrites if target fields overlap, requiring careful pipeline design.

When NOT to use

Avoid using the Enrich processor when enrichment data changes very frequently and requires real-time updates; consider runtime joins or application-side enrichment instead. Also, if fuzzy or partial matching is needed, use alternative methods like scripted queries or external processing.

Production Patterns

In production, enrich processors are often used to add user profile data to logs, product details to sales events, or geo information to IP addresses. Pipelines typically include error handling and conditional enrichment to handle missing matches gracefully.

Connections

Database Join Operations

The Enrich processor performs a form of join at ingestion time, similar to SQL joins done at query time.

Understanding database joins helps grasp how enrichment merges related data, but Enrich processor shifts this work to ingestion for faster searches.

Cache Systems

Enrich indices act like a cache of lookup data optimized for fast retrieval during ingestion.

Knowing cache principles clarifies why enrich indices improve performance by avoiding repeated expensive lookups.

Supply Chain Management

Enriching documents is like adding supplier details to product shipments before delivery.

This connection shows how enrichment adds value by combining core data with related info early in a process, improving efficiency downstream.

Common Pitfalls

#1Not refreshing enrich policy after source data changes.

Wrong approach:PUT /_enrich/policy/customer_policy { "match": { "indices": "customers", "match_field": "customer_id", "enrich_fields": ["name", "email"] } } // Then ingest data without executing the policy

Correct approach:PUT /_enrich/policy/customer_policy { "match": { "indices": "customers", "match_field": "customer_id", "enrich_fields": ["name", "email"] } } POST /_enrich/policy/customer_policy/_execute // Then ingest data

Root cause:Misunderstanding that enrich policies must be manually executed to build the enrich index before use.

#2Expecting fuzzy matching in enrich processor.

Wrong approach:"enrich": { "policy_name": "customer_policy", "field": "cust_name", "target_field": "customer_info" } // where cust_name is partial or misspelled

Correct approach:"enrich": { "policy_name": "customer_policy", "field": "customer_id", "target_field": "customer_info" } // where customer_id is exact match

Root cause:Assuming enrich processor supports approximate matching instead of exact key matching.

#3Overwriting important fields unintentionally during enrichment.

Wrong approach:"enrich": { "policy_name": "product_policy", "field": "product_id", "target_field": "product_id" } // target_field same as source field

Correct approach:"enrich": { "policy_name": "product_policy", "field": "product_id", "target_field": "product_info" } // target_field different to avoid overwrite

Root cause:Not separating enriched data fields from original document fields causes data loss.

Key Takeaways

The Enrich processor enhances documents by adding related data from a separate enrich index during ingestion.

Enrich policies define the source data and matching rules and must be manually executed to refresh enrichment data.

Enrichment uses exact-match lookups optimized for speed, not fuzzy or partial matching.

Proper configuration of fields and pipeline design prevents data conflicts and performance issues.

Advanced use includes chaining multiple enrich processors and understanding internal lookup mechanisms for efficient production use.