How to Use to_tsvector in PostgreSQL for Full-Text Search
In PostgreSQL, use
to_tsvector to convert plain text into a searchable document vector by breaking it into lexemes. This function prepares text for full-text search by normalizing and tokenizing it, making it easy to match search queries.Syntax
The to_tsvector function converts text into a tsvector type, which is a sorted list of distinct lexemes (words) with their positions. It can take an optional configuration parameter to specify language rules.
to_tsvector([config,] text)config: Optional text search configuration (e.g., 'english').text: The input string to be processed.
sql
to_tsvector('english', 'The quick brown fox jumps over the lazy dog')
Example
This example shows how to_tsvector converts a sentence into lexemes for full-text search. It normalizes words and removes stop words like 'the'.
sql
SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
Output
'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
Common Pitfalls
Common mistakes include:
- Not specifying the correct language configuration, which affects stemming and stop words.
- Passing NULL or empty strings, which returns an empty tsvector.
- Expecting
to_tsvectorto perform search; it only prepares text for searching.
Always pair to_tsvector with to_tsquery or plainto_tsquery for searching.
sql
/* Wrong: No config, may use default which might not suit your language */ SELECT to_tsvector('The quick brown fox'); /* Right: Specify language for proper stemming */ SELECT to_tsvector('english', 'The quick brown fox');
Quick Reference
| Parameter | Description |
|---|---|
| config | Optional language configuration like 'english', 'simple', etc. |
| text | Input string to convert into tsvector |
| Return Type | tsvector - searchable document vector |
| Usage | Prepare text for full-text search queries |
Key Takeaways
Use to_tsvector to convert text into searchable lexemes for full-text search.
Specify the correct language configuration to get proper stemming and stop word removal.
to_tsvector only prepares text; combine it with to_tsquery for searching.
Empty or NULL input returns an empty tsvector.
Use to_tsvector consistently on both stored text and search queries for accurate matching.