0
0
ElasticsearchConceptBeginner · 3 min read

What is Ingest Pipeline in Elasticsearch: Explanation and Example

An ingest pipeline in Elasticsearch is a way to preprocess and transform documents before they are indexed. It lets you define a series of steps called processors that modify data, like adding fields or parsing text, automatically when data is ingested.
⚙️

How It Works

Think of an ingest pipeline as a factory assembly line for your data. When you send data to Elasticsearch, it first passes through this line where each station (called a processor) performs a specific task, like cleaning or enriching the data. This happens automatically before the data is stored.

For example, you might want to extract parts of a message, convert dates, or add tags. Each processor in the pipeline handles one of these tasks in order. This way, your data is ready and consistent when it reaches Elasticsearch, saving you from doing these changes later.

💻

Example

This example shows how to create a simple ingest pipeline that adds a new field called ingested_at with the current timestamp to each document.

json
PUT _ingest/pipeline/add_timestamp
{
  "description": "Adds ingestion timestamp",
  "processors": [
    {
      "set": {
        "field": "ingested_at",
        "value": "{{_ingest.timestamp}}"
      }
    }
  ]
}

PUT my-index/_doc/1?pipeline=add_timestamp
{
  "message": "Hello, Elasticsearch!"
}

GET my-index/_doc/1
Output
{ "_index": "my-index", "_id": "1", "_version": 1, "found": true, "_source": { "message": "Hello, Elasticsearch!", "ingested_at": "2024-06-01T12:00:00.000Z" } }
🎯

When to Use

Use ingest pipelines when you want to automate data preparation before indexing. This is helpful if your data needs cleaning, formatting, or enrichment without changing the original source.

Common use cases include parsing logs, extracting fields from text, adding geo-location data, or converting date formats. It saves time by centralizing data transformations and ensures consistent data quality in Elasticsearch.

Key Points

  • An ingest pipeline processes data before indexing in Elasticsearch.
  • It uses processors like set, grok, and date to transform data.
  • Pipelines help keep data clean and consistent automatically.
  • You attach a pipeline to an index request to apply it.

Key Takeaways

Ingest pipelines automate data transformation before indexing in Elasticsearch.
They use processors to modify or enrich documents step-by-step.
Pipelines improve data quality and reduce manual preprocessing.
You specify a pipeline when sending data to Elasticsearch for automatic processing.