Enrich processor in Elasticsearch - Time & Space Complexity
When using the enrich processor in Elasticsearch, it's important to understand how the time to process documents changes as the number of documents grows.
We want to know how the processor's work increases when more documents need enrichment.
Analyze the time complexity of the following enrich processor configuration snippet.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"enrich": {
"policy_name": "user_policy",
"field": "user_id",
"target_field": "user_info"
}
}
]
},
"docs": [
{"_source": {"user_id": "123"}}
]
}
This snippet simulates enriching documents by looking up user info based on user_id using a stored enrich policy.
In this process, the main repeating operation is the lookup for each document.
- Primary operation: For each document, the enrich processor performs a lookup in the enrich index.
- How many times: Once per document being processed.
As the number of documents increases, the total number of lookups grows linearly.
| Input Size (n) | Approx. Operations (lookups) |
|---|---|
| 10 | 10 lookups |
| 100 | 100 lookups |
| 1000 | 1000 lookups |
Pattern observation: Each new document adds one more lookup, so the work grows steadily with the number of documents.
Time Complexity: O(n)
This means the time to enrich documents grows directly in proportion to how many documents you process.
[X] Wrong: "The enrich processor does all lookups once and reuses results, so time stays the same no matter how many documents."
[OK] Correct: Each document requires its own lookup because different documents have different keys to enrich, so the processor must do work for each one.
Understanding how the enrich processor scales helps you explain how Elasticsearch handles data enrichment efficiently, a useful skill when discussing data pipelines and search performance.
"What if the enrich index was cached in memory for faster lookups? How would that affect the time complexity?"