How to Enrich Data During Indexing in Elasticsearch
To enrich data during indexing in Elasticsearch, use
ingest pipelines with processors like set, script, or geoip to modify or add fields before storing documents. This allows you to transform, add metadata, or enhance data automatically as it is indexed.Syntax
An ingest pipeline is defined with a set of processors that modify documents during indexing. Each processor performs a specific task like adding fields, running scripts, or extracting data.
Basic syntax to create a pipeline:
{
"description": "Pipeline description",
"processors": [
{ "processor_type": { "field": "value", ... } },
...
]
}When indexing, specify the pipeline name to apply it:
POST /index/_doc?pipeline=pipeline_name
{
"field": "value"
}json
{
"description": "Add a new field",
"processors": [
{
"set": {
"field": "new_field",
"value": "enriched_value"
}
}
]
}Example
This example creates an ingest pipeline that adds a source field and enriches IP data with geo-location info using the geoip processor. Then it indexes a document using this pipeline.
json
PUT _ingest/pipeline/enrich_pipeline
{
"description": "Add source and geoip info",
"processors": [
{
"set": {
"field": "source",
"value": "web"
}
},
{
"geoip": {
"field": "ip"
}
}
]
}
POST /logs/_doc?pipeline=enrich_pipeline
{
"ip": "8.8.8.8",
"message": "User accessed the site"
}Output
{
"_index": "logs",
"_id": "generated_id",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
Common Pitfalls
- Not specifying the pipeline name during indexing means enrichment won't happen.
- Using incorrect processor field names causes pipeline failures.
- Overloading pipelines with many processors can slow indexing.
- For dynamic enrichment, scripts must be carefully tested to avoid errors.
json
Wrong way (missing pipeline):
POST /logs/_doc
{
"ip": "8.8.8.8",
"message": "User accessed the site"
}
Right way (using pipeline):
POST /logs/_doc?pipeline=enrich_pipeline
{
"ip": "8.8.8.8",
"message": "User accessed the site"
}Quick Reference
| Processor | Purpose | Example Usage |
|---|---|---|
| set | Add or update a field | {"set": {"field": "status", "value": "active"}} |
| geoip | Add geo-location from IP | {"geoip": {"field": "ip"}} |
| script | Run custom script to modify data | {"script": {"source": "ctx.field += ' enriched'"}} |
| rename | Rename a field | {"rename": {"field": "old", "target_field": "new"}} |
| remove | Remove a field | {"remove": {"field": "temp"}} |
Key Takeaways
Use ingest pipelines with processors to enrich data automatically during indexing.
Always specify the pipeline name in the indexing request to apply enrichment.
Test processors and scripts carefully to avoid indexing errors.
Common processors include set, geoip, script, rename, and remove.
Keep pipelines efficient to maintain good indexing performance.