0
0
Elasticsearchquery~15 mins

Source filtering in Elasticsearch - Deep Dive

Choose your learning style9 modes available
Overview - Source filtering
What is it?
Source filtering in Elasticsearch lets you control which parts of a document are returned in search results. Instead of getting the whole document, you can choose to get only specific fields or exclude some fields. This helps reduce the amount of data sent over the network and speeds up responses.
Why it matters
Without source filtering, every search returns full documents, which can be large and slow to transfer. This wastes bandwidth and processing time, especially when you only need a few fields. Source filtering solves this by sending only what you need, making applications faster and more efficient.
Where it fits
Before learning source filtering, you should understand basic Elasticsearch queries and how documents are stored. After mastering source filtering, you can explore advanced topics like field-level security, script fields, and performance tuning.
Mental Model
Core Idea
Source filtering is like choosing which chapters of a book to photocopy instead of copying the whole book.
Think of it like...
Imagine you have a big book but only want to share a few chapters with a friend. Instead of giving the entire book, you photocopy just those chapters. Source filtering works the same way by sending only selected parts of a document.
┌─────────────────────────────┐
│       Elasticsearch          │
│  ┌───────────────────────┐  │
│  │ Full Document         │  │
│  │ {                    │  │
│  │   title: "Book A",   │  │
│  │   author: "Alice",   │  │
│  │   content: "...",    │  │
│  │   date: "2024-01-01" │  │
│  │ }                    │  │
│  └───────────────────────┘  │
│           │                  │
│           ▼ source filtering │
│  ┌───────────────────────┐  │
│  │ Filtered Document     │  │
│  │ {                    │  │
│  │   title: "Book A"    │  │
│  │ }                    │  │
│  └───────────────────────┘  │
└─────────────────────────────┘
Build-Up - 6 Steps
1
FoundationWhat is source filtering
🤔
Concept: Source filtering controls which parts of a document Elasticsearch returns after a search.
When you search in Elasticsearch, it finds matching documents. By default, it sends the entire document back. Source filtering lets you pick only the fields you want, like just the title or author, instead of everything.
Result
You get smaller responses with only the fields you asked for.
Understanding that Elasticsearch can send partial documents helps you optimize data transfer and speed up your app.
2
FoundationBasic syntax for source filtering
🤔
Concept: Elasticsearch uses the _source field in queries to include or exclude fields.
In your search request, you add a _source parameter. It can be true (default, return all), false (return none), or an array of field names to include. Example: { "query": { "match_all": {} }, "_source": ["title", "author"] } This returns only the title and author fields.
Result
Search results show only the specified fields.
Knowing the syntax lets you quickly control what data you get back without changing your index.
3
IntermediateIncluding and excluding fields together
🤔Before reading on: Do you think you can include and exclude fields at the same time in source filtering? Commit to yes or no.
Concept: You can specify both fields to include and fields to exclude in the _source parameter.
The _source parameter can be an object with includes and excludes arrays: { "_source": { "includes": ["title", "author"], "excludes": ["author"] } } This means: include title and author, but then exclude author, so only title is returned.
Result
The final result contains only the fields after applying both include and exclude rules.
Understanding how include and exclude interact helps you fine-tune exactly which fields you want.
4
IntermediateUsing wildcards in source filtering
🤔Before reading on: Can you use wildcards like * to select multiple fields in source filtering? Commit to yes or no.
Concept: Source filtering supports wildcards to match multiple fields by pattern.
You can use * to match many fields. For example: { "_source": ["user.*", "post.title"] } This returns all fields starting with user. and the post.title field.
Result
Search results include all matching fields by the wildcard pattern.
Knowing wildcards lets you select groups of fields without listing each one, saving time and effort.
5
AdvancedPerformance impact of source filtering
🤔Before reading on: Does source filtering always improve performance, or can it sometimes add overhead? Commit to your guess.
Concept: Source filtering reduces data sent but can add CPU work to filter fields, especially with complex includes/excludes.
When you filter source, Elasticsearch must process the document to remove unwanted fields. For small documents, this is fast. For large documents or many fields, filtering can add CPU overhead. However, network savings usually outweigh this cost.
Result
You get faster network transfer but may see slight CPU increase on the server.
Knowing the tradeoff helps you decide when to use source filtering for best overall performance.
6
ExpertSource filtering with nested and object fields
🤔Before reading on: Do you think source filtering can selectively return parts of nested objects, or does it return entire nested objects always? Commit to your answer.
Concept: Source filtering can include or exclude specific nested fields, but partial filtering inside nested objects has limitations.
For nested or object fields, you can specify paths like "comments.author" to include only authors of comments. However, Elasticsearch returns the entire nested object if any part matches include rules. Excluding parts inside nested objects is limited and can lead to unexpected results.
Result
You can control nested fields to some extent, but full partial filtering inside nested objects is tricky.
Understanding nested field behavior prevents surprises and helps design your index and queries for clear results.
Under the Hood
Elasticsearch stores the original JSON document in a special _source field. When a search matches, it retrieves this stored _source. Source filtering works by parsing this JSON and removing fields not requested before sending the response. This happens after the query finds matching documents but before sending results to the client.
Why designed this way?
Storing the full _source allows Elasticsearch to reconstruct documents exactly as indexed. Source filtering was added to reduce network load and client processing without changing how documents are stored. Alternatives like storing only selected fields would limit flexibility and require reindexing.
┌───────────────┐
│ Search Query  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Query Engine  │
│ (find matches)│
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Retrieve _source│
│ (full JSON)    │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Source Filter │
│ (include/excl)│
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Send Response │
│ (filtered doc)│
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does setting _source to false mean Elasticsearch won't store the document? Commit to yes or no.
Common Belief:If I set _source to false, Elasticsearch won't store the document source at all.
Tap to reveal reality
Reality:Setting _source to false only stops returning the source in search results; the document is still stored internally for indexing and other operations.
Why it matters:Thinking _source:false means no storage can lead to data loss or inability to retrieve documents later.
Quick: Does source filtering improve search speed by reducing how many documents Elasticsearch scans? Commit to yes or no.
Common Belief:Source filtering makes searches faster by scanning fewer documents.
Tap to reveal reality
Reality:Source filtering only affects what data is returned after matching; it does not change how many documents are scanned or matched.
Why it matters:Believing this can cause wrong expectations about performance improvements.
Quick: Can you exclude a field in source filtering and still get it if it's part of a nested object? Commit to yes or no.
Common Belief:Excluding a field always removes it, even inside nested objects.
Tap to reveal reality
Reality:Excluding fields inside nested objects may not fully remove them due to how nested documents are stored and returned.
Why it matters:Misunderstanding this leads to unexpected data exposure or confusion in results.
Quick: Does source filtering affect stored fields or doc values? Commit to yes or no.
Common Belief:Source filtering controls all fields returned, including stored fields and doc values.
Tap to reveal reality
Reality:Source filtering only affects the _source field; stored fields and doc values are controlled separately.
Why it matters:Confusing these can cause bugs when expecting certain fields to be filtered but they still appear.
Expert Zone
1
Source filtering can interact subtly with scripts and runtime fields, sometimes requiring explicit inclusion to avoid missing data.
2
Using complex include/exclude patterns can increase CPU load, so balancing filtering granularity with performance is key.
3
Source filtering does not affect how Elasticsearch indexes or scores documents, only what is returned, which can confuse debugging if overlooked.
When NOT to use
Avoid source filtering when you need the full document for processing or when using stored fields or doc values for performance. Instead, use field-level security or index-time filtering for strict data control.
Production Patterns
In production, source filtering is often combined with pagination and sorting to reduce payload size. It's also used in APIs to return lightweight responses, improving client performance and reducing bandwidth costs.
Connections
Field-level security
Builds-on
Understanding source filtering helps grasp how field-level security restricts access to sensitive fields by controlling what parts of documents users can see.
REST API response optimization
Same pattern
Source filtering is a specific example of a common pattern in APIs: returning only needed data to improve speed and reduce bandwidth.
Data compression
Complementary technique
While source filtering reduces data size by removing fields, data compression reduces size by encoding data efficiently; combining both maximizes network efficiency.
Common Pitfalls
#1Expecting source filtering to speed up query execution time.
Wrong approach:{ "query": { "match_all": {} }, "_source": ["title"] } // Expecting faster search because only title is returned
Correct approach:{ "query": { "match_all": {} } } // Understand source filtering only reduces response size, not query speed
Root cause:Confusing data returned with query processing; filtering affects output, not how Elasticsearch finds matches.
#2Using _source: false to hide sensitive data but still exposing it via stored fields.
Wrong approach:{ "_source": false, "stored_fields": ["password"] } // Trying to hide password but still returning it
Correct approach:{ "_source": false // Do not request stored fields containing sensitive data } // Or use field-level security to restrict access
Root cause:Misunderstanding that _source:false hides only _source, not stored fields.
#3Excluding nested fields expecting them to be fully removed.
Wrong approach:{ "_source": { "excludes": ["comments.text"] } } // Expecting comments.text to be removed entirely
Correct approach:{ "_source": { "includes": ["comments.author"] } } // Explicitly include only desired nested fields to control output
Root cause:Not realizing nested objects are returned whole if any part is included.
Key Takeaways
Source filtering controls which parts of a document Elasticsearch returns after a search, reducing data size and improving response times.
You can include or exclude specific fields, use wildcards, and combine rules to fine-tune results.
Source filtering affects only the returned data, not how Elasticsearch searches or stores documents.
Understanding nested field behavior in source filtering prevents unexpected data exposure.
Balancing source filtering with performance considerations is key for efficient Elasticsearch use.