0
0
Elasticsearchquery~15 mins

Text vs keyword field types in Elasticsearch - Trade-offs & Expert Analysis

Choose your learning style9 modes available
Overview - Text vs keyword field types
What is it?
In Elasticsearch, fields in documents can be stored as either text or keyword types. Text fields are used for full-text search, where the content is analyzed and broken into words. Keyword fields store exact values without analysis, useful for filtering, sorting, and aggregations. Understanding the difference helps you choose the right field type for your search needs.
Why it matters
Without knowing the difference, you might store data in a way that makes searching slow or inaccurate. For example, searching for exact matches on a text field can fail because it’s analyzed into parts. This can cause wrong search results or inefficient queries, impacting user experience and system performance.
Where it fits
Before this, you should understand basic Elasticsearch concepts like documents, fields, and indexing. After this, you can learn about analyzers, mappings, and how to optimize search queries for performance and relevance.
Mental Model
Core Idea
Text fields break content into searchable words for full-text search, while keyword fields store exact values for precise matching and sorting.
Think of it like...
Think of text fields like a book index that breaks down topics into words to find pages easily, and keyword fields like a library catalog number that points exactly to one book without breaking it down.
┌───────────────┐       ┌───────────────┐
│   Text Field  │──────▶│ Analyzed into │
│ (Full-text)   │       │  words/tokens │
└───────────────┘       └───────────────┘
         │                        │
         ▼                        ▼
┌───────────────┐       ┌───────────────┐
│ Keyword Field │──────▶│ Stored as is  │
│ (Exact match) │       │ (No analysis) │
└───────────────┘       └───────────────┘
Build-Up - 7 Steps
1
FoundationWhat is a Text Field?
🤔
Concept: Text fields store strings that are analyzed for full-text search.
A text field takes the input string and breaks it into smaller parts called tokens or words. For example, the sentence 'Fast cars are cool' becomes ['fast', 'cars', 'are', 'cool']. This allows Elasticsearch to find documents matching any of these words when you search.
Result
You can search for any word in the text and find matching documents, even if the exact phrase isn't typed.
Understanding that text fields break content into words explains why they are great for searching by meaning or parts of text.
2
FoundationWhat is a Keyword Field?
🤔
Concept: Keyword fields store exact values without breaking them down.
A keyword field keeps the entire string as one piece. For example, 'Fast cars are cool' is stored exactly like that. This is useful when you want to filter or sort by exact values, like tags, IDs, or categories.
Result
You can filter or sort documents by exact matches, but you cannot search inside the text for parts of the string.
Knowing keyword fields store exact values helps you use them for precise filtering and sorting.
3
IntermediateHow Text Fields Are Analyzed
🤔Before reading on: do you think text fields keep the original string intact or break it into parts? Commit to your answer.
Concept: Text fields use analyzers to process the string into tokens for searching.
When you index a text field, Elasticsearch applies an analyzer that lowercases words, removes punctuation, and splits the text into tokens. For example, 'Fast, Cars!' becomes ['fast', 'cars']. This makes searches case-insensitive and more flexible.
Result
Searches match words regardless of case or punctuation, improving search quality.
Understanding analyzers reveals why text search is powerful but also why exact matches can fail on text fields.
4
IntermediateWhen to Use Keyword Fields
🤔Before reading on: do you think keyword fields are good for full-text search or exact matching? Commit to your answer.
Concept: Keyword fields are best for exact matching, filtering, and sorting, not full-text search.
Use keyword fields for data like tags, IDs, email addresses, or categories where you want to find exact matches or sort results. For example, filtering all documents where 'status' is 'active' uses a keyword field.
Result
Filtering and sorting are fast and accurate because the field is stored exactly as is.
Knowing when to use keyword fields prevents inefficient or incorrect searches.
5
IntermediateMulti-fields: Combining Text and Keyword
🤔Before reading on: do you think a field can be both text and keyword at the same time? Commit to your answer.
Concept: Elasticsearch allows storing the same data as both text and keyword using multi-fields.
You can define a field to be analyzed as text for full-text search and also stored as keyword for exact matching. For example, a 'title' field can be searched by words and also filtered by exact title using 'title.keyword'.
Result
You get the best of both worlds: flexible search and precise filtering.
Understanding multi-fields helps design flexible and efficient search indexes.
6
AdvancedImpact on Performance and Storage
🤔Before reading on: do you think text fields or keyword fields use more storage and CPU? Commit to your answer.
Concept: Text fields require more processing and storage due to analysis, while keyword fields are simpler and faster for exact matches.
Text fields need analyzers to break text into tokens and store inverted indexes for each token, which uses more space and CPU. Keyword fields store the exact value and use less processing, making filtering and sorting faster.
Result
Choosing the right field type affects search speed and resource use.
Knowing the performance tradeoffs guides efficient index design.
7
ExpertSurprises in Keyword Field Limits
🤔Before reading on: do you think keyword fields can store unlimited length strings? Commit to your answer.
Concept: Keyword fields have a default length limit and special handling for very long strings.
By default, keyword fields limit string length to 256 characters. Longer strings are ignored (not truncated) if they exceed this limit unless configured otherwise. This prevents performance issues but can surprise users storing long values like URLs or logs.
Result
Without adjusting settings, long keyword values may be ignored, causing unexpected search behavior.
Understanding keyword field limits prevents subtle bugs and data loss in production.
Under the Hood
When indexing, Elasticsearch processes text fields through analyzers that tokenize, lowercase, and filter the input, creating an inverted index mapping tokens to documents. Keyword fields skip analysis and store the exact string in a columnar data structure optimized for exact match queries, filters, and sorting.
Why designed this way?
This design balances flexibility and performance. Full-text search needs tokenization for relevance and partial matches, while exact matches require fast, precise lookups. Separating these types avoids slowing down queries and keeps storage efficient.
┌───────────────┐        ┌───────────────┐
│ Input String  │        │ Input String  │
└──────┬────────┘        └──────┬────────┘
       │                        │
       ▼                        ▼
┌───────────────┐        ┌───────────────┐
│ Analyzer     │        │ No Analyzer   │
│ (tokenizes)  │        │ (stores raw)  │
└──────┬────────┘        └──────┬────────┘
       │                        │
       ▼                        ▼
┌───────────────┐        ┌───────────────┐
│ Inverted Index│        │ Exact Storage │
│ (tokens → doc)│        │ (keyword data)│
└───────────────┘        └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think searching a keyword field finds partial matches inside the string? Commit yes or no.
Common Belief:Keyword fields support partial matching like text fields do.
Tap to reveal reality
Reality:Keyword fields only support exact matches; partial matches require text fields or special queries.
Why it matters:Assuming keyword fields support partial matches leads to failed searches and frustrated users.
Quick: Do you think text fields store the original string exactly as is? Commit yes or no.
Common Belief:Text fields keep the original string intact for retrieval and search.
Tap to reveal reality
Reality:Text fields analyze and break the string into tokens, so the original string is not stored as a single unit for search purposes.
Why it matters:Expecting exact phrase matches on text fields without special queries can cause confusion and incorrect results.
Quick: Do you think keyword fields can store very long strings without limits? Commit yes or no.
Common Belief:Keyword fields can store strings of any length without issue.
Tap to reveal reality
Reality:Keyword fields have a default length limit (256 characters) and may ignore strings longer than this unless configured.
Why it matters:Ignoring this limit can cause data loss or errors in production systems.
Quick: Do you think multi-fields duplicate data and waste storage? Commit yes or no.
Common Belief:Using multi-fields to store both text and keyword versions wastes a lot of storage.
Tap to reveal reality
Reality:Multi-fields share the same source data and add minimal overhead, providing flexible search without significant storage cost.
Why it matters:Avoiding multi-fields due to storage fears limits search capabilities unnecessarily.
Expert Zone
1
Keyword fields are case-sensitive by default, so filtering on 'Active' vs 'active' differs unless normalized.
2
Text fields can use custom analyzers to control tokenization, affecting search precision and recall deeply.
3
Multi-fields allow different analyzers on the same data, enabling complex search patterns without duplicating data.
When NOT to use
Avoid using text fields for filtering or sorting because they are slow and imprecise; instead, use keyword fields. Conversely, do not use keyword fields for full-text search or relevance ranking; use text fields or specialized search types like 'match' queries.
Production Patterns
In production, it’s common to define fields as multi-fields with a text type for search and a keyword subfield for filtering and sorting. This pattern balances flexibility and performance. Also, adjusting keyword length limits and analyzer settings is standard to handle real-world data.
Connections
Inverted Index
Text fields build inverted indexes from tokens, keyword fields store exact values without tokenization.
Understanding inverted indexes clarifies why text fields support full-text search and keyword fields do not.
Data Normalization
Text field analyzers normalize data (lowercase, remove punctuation), keyword fields do not.
Knowing normalization helps explain why searches on text fields are case-insensitive but keyword filters are case-sensitive.
Library Cataloging Systems
Keyword fields act like catalog numbers for exact identification, text fields like subject indexes for searching topics.
This cross-domain link shows how organizing information for exact lookup versus flexible search is a common challenge.
Common Pitfalls
#1Filtering on a text field expecting exact matches.
Wrong approach:GET /books/_search { "query": { "term": { "title": "Elasticsearch Basics" } } }
Correct approach:GET /books/_search { "query": { "term": { "title.keyword": "Elasticsearch Basics" } } }
Root cause:Text fields are analyzed and broken into tokens, so exact term queries fail unless using the keyword subfield.
#2Using keyword fields for full-text search queries.
Wrong approach:GET /articles/_search { "match": { "content.keyword": "fast cars" } }
Correct approach:GET /articles/_search { "match": { "content": "fast cars" } }
Root cause:Keyword fields do not support tokenization or full-text search, so match queries on them do not work as expected.
#3Ignoring keyword field length limits causing data truncation.
Wrong approach:Mapping without length limit adjustment: "tags": { "type": "keyword" }
Correct approach:Mapping with length limit adjustment: "tags": { "type": "keyword", "ignore_above": 512 }
Root cause:Default ignore_above setting ignores strings longer than 256 characters, causing unexpected data loss.
Key Takeaways
Text fields are designed for full-text search by breaking content into searchable words.
Keyword fields store exact values for precise filtering, sorting, and aggregations.
Multi-fields let you store the same data as both text and keyword for flexible queries.
Choosing the right field type affects search accuracy, performance, and storage.
Understanding analyzer behavior and keyword limits prevents common search mistakes.