Ingest pipelines in Elasticsearch - Time & Space Complexity
When using ingest pipelines in Elasticsearch, it's important to know how the processing time changes as more data flows through.
We want to understand how the pipeline's steps affect the total time taken as input size grows.
Analyze the time complexity of this ingest pipeline configuration.
PUT _ingest/pipeline/sample_pipeline
{
"processors": [
{ "set": { "field": "field1", "value": "value1" } },
{ "uppercase": { "field": "field2" } },
{ "rename": { "field": "field3", "target_field": "field3_renamed" } }
]
}
This pipeline applies three processors in order to each document that passes through it.
Each document goes through all processors one after another.
- Primary operation: Sequential processing of each document by all processors.
- How many times: Once per document, for each processor.
As the number of documents increases, the total processing time grows proportionally.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 30 (10 docs x 3 processors) |
| 100 | 300 (100 docs x 3 processors) |
| 1000 | 3000 (1000 docs x 3 processors) |
Pattern observation: The total work grows directly with the number of documents.
Time Complexity: O(n)
This means the processing time increases in a straight line as more documents are ingested.
[X] Wrong: "Adding more processors won't affect the time much because they run fast."
[OK] Correct: Each processor adds work for every document, so more processors multiply the total time.
Understanding how ingest pipelines scale helps you design efficient data flows and shows you can think about performance in real systems.
What if we added a processor that loops inside itself for each document? How would the time complexity change?