0
0
Elasticsearchquery~10 mins

Why documents are the unit of data in Elasticsearch - Visual Breakdown

Choose your learning style9 modes available
Concept Flow - Why documents are the unit of data
User sends data
Data split into documents
Each document indexed separately
Documents stored in shards
Search queries run on documents
Results returned based on document matches
Data is broken into documents, each stored and searched independently, making Elasticsearch fast and flexible.
Execution Sample
Elasticsearch
POST /library/_doc/1
{
  "title": "Learn Elasticsearch",
  "author": "Jane"
}
This adds a single document with book info to the 'library' index.
Execution Table
StepActionData UnitStorage LocationEffect
1Receive data from userRaw JSONN/AData ready to be processed
2Split data into documentsSingle documentN/AEach document is a self-contained unit
3Index documentDocumentShard in indexDocument stored and searchable
4Run search queryDocumentsShardsMatches found per document
5Return resultsDocumentsN/AUser gets relevant documents
💡 All data is handled as documents, enabling efficient storage and search.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4Final
dataRaw JSON inputSplit into documentsIndexed documentsQueried documentsSearch results
Key Moments - 2 Insights
Why does Elasticsearch treat data as documents instead of rows or columns?
Because each document is a complete, self-contained unit of data that can be indexed and searched independently, as shown in execution_table step 3.
How does storing data as documents improve search speed?
Documents are stored in shards and indexed separately, allowing Elasticsearch to quickly find matches without scanning entire datasets, as seen in execution_table steps 3 and 4.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step is the data split into documents?
AStep 3
BStep 1
CStep 2
DStep 4
💡 Hint
Check the 'Action' column for 'Split data into documents' in execution_table.
According to variable_tracker, what is the state of 'data' after Step 3?
ARaw JSON input
BIndexed documents
CSplit into documents
DSearch results
💡 Hint
Look at the 'After Step 3' column for 'data' in variable_tracker.
If Elasticsearch did not use documents as units, what would likely happen to search speed?
ASearch speed would slow down
BSearch speed would improve
CSearch speed would stay the same
DSearch speed would be unpredictable
💡 Hint
Refer to key_moments about why documents improve search speed.
Concept Snapshot
In Elasticsearch, data is stored as documents.
Each document is a self-contained JSON object.
Documents are indexed separately for fast search.
This design allows flexible, scalable data handling.
Search queries match documents, not rows or columns.
Full Transcript
Elasticsearch treats data as documents because each document is a complete unit of information. When data is received, it is split into these documents. Each document is then indexed and stored in shards within the Elasticsearch cluster. This allows Elasticsearch to quickly search and retrieve relevant documents when a query is run. Handling data as documents rather than rows or columns improves speed and flexibility. The execution table shows the steps from receiving data to returning search results, and the variable tracker follows the state of data through these steps. Key moments clarify why documents are used and how they help search performance. The visual quiz tests understanding of these steps and concepts.