0
0
ElasticsearchHow-ToBeginner · 4 min read

How to Create Ingest Pipeline in Elasticsearch Quickly

To create an ingest pipeline in Elasticsearch, use the PUT _ingest/pipeline/{pipeline_id} API with a JSON body defining processors. This pipeline processes documents before indexing, allowing transformations like parsing or enrichment.
📐

Syntax

The basic syntax to create an ingest pipeline uses the PUT HTTP method on the _ingest/pipeline/{pipeline_id} endpoint. The request body is a JSON object that defines the pipeline's description and a list of processors that modify documents.

  • pipeline_id: A unique name for your pipeline.
  • description: A short text describing the pipeline.
  • processors: An array of actions to perform on documents, such as parsing or removing fields.
json
PUT _ingest/pipeline/{pipeline_id}
{
  "description": "Description of what this pipeline does",
  "processors": [
    {
      "processor_type": {
        "field": "field_name",
        "target_field": "new_field_name"
      }
    }
  ]
}
💻

Example

This example creates a pipeline named my_pipeline that adds a timestamp field called ingest_timestamp to each document when it is ingested.

json
PUT _ingest/pipeline/my_pipeline
{
  "description": "Adds ingest timestamp",
  "processors": [
    {
      "set": {
        "field": "ingest_timestamp",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}
Output
{ "acknowledged": true }
⚠️

Common Pitfalls

Common mistakes when creating ingest pipelines include:

  • Using invalid processor types or misspelling processor names.
  • Not providing required fields for processors, causing errors.
  • Forgetting to specify the pipeline when indexing documents, so the pipeline is never applied.
  • Trying to modify fields that do not exist in the document.

Always test your pipeline with the _simulate API before using it in production.

json
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "field": "ingest_timestamp",
          "value": "{{_ingest.timestamp}}"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "Test document"
      }
    }
  ]
}
Output
{ "docs": [ { "doc": { "_index": "_index", "_type": "_doc", "_id": "_id", "_source": { "message": "Test document", "ingest_timestamp": "2024-06-01T12:00:00.000Z" } } } ] }
📊

Quick Reference

ElementDescription
pipeline_idUnique name for the ingest pipeline
descriptionShort text describing the pipeline's purpose
processorsArray of actions to transform documents
set processorAdds or updates a field with a specified value
remove processorRemoves a specified field from documents
grok processorParses text fields using patterns
_simulate APITests pipeline without indexing data

Key Takeaways

Use PUT _ingest/pipeline/{pipeline_id} with JSON body to create a pipeline.
Define processors inside the pipeline to transform documents before indexing.
Test pipelines with the _simulate API to avoid errors.
Always specify the pipeline when indexing documents to apply it.
Common processors include set, remove, and grok for flexible data handling.