to_tsvector for document conversion in PostgreSQL - Time & Space Complexity
We want to understand how the time to convert text into searchable tokens grows as the text gets longer.
How does the processing time change when the document size increases?
Analyze the time complexity of the following PostgreSQL code snippet.
SELECT to_tsvector('english', document_text) FROM documents;
This code converts the text in the column document_text into a searchable text vector for each row.
Look for repeated work inside the function.
- Primary operation: Scanning each word in the document text to normalize and index it.
- How many times: Once for every word in the document text.
The time to process grows roughly in direct proportion to the number of words in the document.
| Input Size (words) | Approx. Operations |
|---|---|
| 10 | About 10 word scans and normalizations |
| 100 | About 100 word scans and normalizations |
| 1000 | About 1000 word scans and normalizations |
Pattern observation: Doubling the number of words roughly doubles the work done.
Time Complexity: O(n)
This means the time to convert text grows linearly with the number of words in the document.
[X] Wrong: "The function processes the whole document instantly regardless of size."
[OK] Correct: The function must look at each word to convert it, so bigger documents take more time.
Understanding how text processing scales helps you explain performance in search features and indexing tasks.
"What if the document text is pre-tokenized into an array of words before calling to_tsvector? How would the time complexity change?"