Overview - Indexing a document (POST/PUT)

What is it?

Indexing a document in Elasticsearch means adding or updating data in a searchable way. You send your data to Elasticsearch using HTTP POST or PUT requests. This process stores the document inside an index, which is like a folder for related data. Once indexed, the document can be quickly found using search queries.

Why it matters

Without indexing, Elasticsearch cannot find or retrieve your data efficiently. Indexing organizes data so searches are fast and relevant, even with large amounts of information. Without it, searching would be slow and unreliable, making it hard to build responsive apps or analyze data in real time.

Where it fits

Before learning indexing, you should understand basic HTTP methods and JSON format. After mastering indexing, you can learn about querying, updating, and deleting documents, as well as managing indices and mappings.

Mental Model

Core Idea

Indexing a document means storing data in Elasticsearch so it can be quickly searched and retrieved later.

Think of it like...

Imagine a library where each book is a document. Indexing is like placing a book on a specific shelf with a label, so you can find it easily when you want to read it.

┌─────────────┐       POST/PUT       ┌─────────────┐
│ Your Data   │ ────────────────▶ │ Elasticsearch│
│ (JSON doc)  │                   │ Index       │
└─────────────┘                   └─────────────┘
         │                              │
         │                              ▼
         │                      ┌─────────────┐
         │                      │ Stored Doc  │
         │                      └─────────────┘

Build-Up - 7 Steps

1

FoundationWhat is a Document in Elasticsearch

Concept: A document is a basic unit of data stored in Elasticsearch, formatted as JSON.

In Elasticsearch, data is stored as documents. Each document is a JSON object containing fields and values. For example, a document about a book might have fields like title, author, and year. Documents are stored inside an index, which groups similar documents together.

Result

You understand that documents are JSON objects representing data entries in Elasticsearch.

Knowing that documents are JSON objects helps you see how Elasticsearch stores and organizes data in a flexible, readable format.

2

FoundationUnderstanding HTTP POST and PUT Methods

3

IntermediateIndexing a Document with POST

4

IntermediateIndexing a Document with PUT

5

IntermediateHandling Responses After Indexing

6

AdvancedIndexing with Custom Routing and Parameters

7

ExpertIndexing Internals and Version Conflicts

Under the Hood

When you send a POST or PUT request to Elasticsearch, it parses the JSON document and determines the target index and document ID. The document is then analyzed and broken into terms for indexing. Elasticsearch stores the document in a shard based on routing and shard allocation. It updates internal data structures like inverted indices to enable fast search. Versioning tracks changes to prevent conflicts.

Why designed this way?

Elasticsearch was designed for speed and scalability. Using HTTP methods and JSON makes it easy to integrate with many systems. Automatic ID generation simplifies bulk data loading, while explicit IDs and versioning provide control and consistency. Routing and shards distribute data for parallel processing, balancing load and search speed.

┌───────────────┐       HTTP POST/PUT       ┌───────────────┐
│ Client sends  │ ───────────────────────▶ │ Elasticsearch │
│ JSON document │                         │ REST API      │
└───────────────┘                         └───────────────┘
          │                                         │
          ▼                                         ▼
┌─────────────────┐                      ┌─────────────────┐
│ Parse JSON      │                      │ Determine Index │
│ & Validate      │                      │ & Document ID   │
└─────────────────┘                      └─────────────────┘
          │                                         │
          ▼                                         ▼
┌─────────────────┐                      ┌─────────────────┐
│ Analyze Document│                      │ Assign to Shard │
│ (tokenize text) │                      │ (using routing) │
└─────────────────┘                      └─────────────────┘
          │                                         │
          ▼                                         ▼
┌─────────────────┐                      ┌─────────────────┐
│ Update Inverted │                      │ Store Document  │
│ Index & Version │                      │ & Metadata      │
└─────────────────┘                      └─────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does POST require you to specify the document ID? Commit to yes or no.

Common Belief:POST always requires you to provide a document ID.

Tap to reveal reality

Quick: Does PUT only update existing documents or can it create new ones? Commit to your answer.

Common Belief:PUT can only update documents that already exist.

Tap to reveal reality

Quick: Does Elasticsearch allow overwriting documents without any version checks? Commit to yes or no.

Common Belief:Elasticsearch overwrites documents silently without checking versions.

Tap to reveal reality

Quick: Does indexing a document immediately make it searchable? Commit to yes or no.

Common Belief:Once indexed, a document is instantly searchable.

Tap to reveal reality

Expert Zone

1

Elasticsearch's internal versioning supports optimistic concurrency control, which is crucial for distributed systems to avoid race conditions.

2

Routing keys influence shard placement, which can dramatically affect query performance and cluster balance, but misuse can cause hotspots.

3

The refresh interval controls how often Elasticsearch makes indexed documents visible to search, balancing latency and performance.

When NOT to use

Indexing via POST/PUT is not suitable for extremely high-frequency updates where near real-time consistency is critical; in such cases, consider using Elasticsearch's bulk API or external queuing systems to batch updates efficiently.

Production Patterns

In production, indexing often uses the bulk API to send many documents at once for efficiency. Applications manage document IDs carefully to avoid conflicts and use routing to optimize shard usage. Versioning and optimistic concurrency control prevent data corruption in multi-user environments.

Connections

HTTP Protocol

Indexing uses HTTP methods POST and PUT to communicate with Elasticsearch's REST API.

Understanding HTTP methods helps grasp how Elasticsearch receives and processes data, making integration with web services straightforward.

Inverted Index (Information Retrieval)

Indexing a document builds an inverted index to enable fast full-text search.

Knowing how inverted indices work explains why Elasticsearch can search large text collections quickly after indexing.

Optimistic Concurrency Control (Distributed Systems)

Elasticsearch uses versioning during indexing to implement optimistic concurrency control.

Recognizing this connection clarifies how Elasticsearch prevents conflicting updates in distributed environments.

Common Pitfalls

#1Trying to index a document with POST but specifying an ID in the URL.

Wrong approach:POST /books/_doc/123 { "title": "My Book" }

Correct approach:PUT /books/_doc/123 { "title": "My Book" }

Root cause:Confusing POST and PUT semantics; POST should not include an ID in the URL.

#2Ignoring the response after indexing and assuming success.

Wrong approach:POST /books/_doc { "title": "New Book" } (no response check)

Correct approach:POST /books/_doc { "title": "New Book" } and check response for 'result': 'created'

Root cause:Not verifying server responses leads to missed errors or failed indexing.

#3Updating a document without handling version conflicts.

Wrong approach:PUT /books/_doc/123 { "title": "Updated" } without version parameter

Correct approach:PUT /books/_doc/123?if_seq_no=10&if_primary_term=1 { "title": "Updated" }

Root cause:Not using versioning parameters causes silent overwrites or conflicts.

Key Takeaways

Indexing a document means storing JSON data in Elasticsearch so it can be searched quickly.

POST lets Elasticsearch generate document IDs automatically, while PUT requires specifying the ID and can update or create documents.

Elasticsearch returns detailed responses after indexing, which you should check to confirm success.

Versioning and routing are advanced features that control document updates and shard placement, improving consistency and performance.

Understanding the HTTP methods and Elasticsearch's internal mechanisms helps avoid common mistakes and build efficient, reliable search applications.