Overview - Ranking with ts_rank

What is it?

Ranking with ts_rank is a way to measure how well a text matches a search query in PostgreSQL. It gives a score to each row based on how relevant the text is to the search terms. This helps to sort results so the most relevant ones appear first. It is used with full-text search features in PostgreSQL.

Why it matters

Without ranking, search results would appear in random order, making it hard to find the best matches quickly. Ranking helps users see the most important or relevant information first, improving search experience and saving time. It solves the problem of sorting many possible matches by their usefulness.

Where it fits

Before learning ts_rank, you should understand basic SQL queries and PostgreSQL full-text search concepts like tsvector and tsquery. After mastering ts_rank, you can explore advanced ranking functions, custom weighting, and performance tuning for large text searches.

Mental Model

Core Idea

ts_rank scores how well a piece of text matches a search query, letting you order results by relevance.

Think of it like...

Imagine looking for a book in a library. ts_rank is like a helper who checks how many times your search words appear in each book and how important those words are, then tells you which books to look at first.

Search Query --> [Text 1: score 0.8] --> Result 1
             [Text 2: score 0.3] --> Result 2
             [Text 3: score 0.5] --> Result 3

Results sorted by score descending:
Result 1 (0.8)
Result 3 (0.5)
Result 2 (0.3)

Build-Up - 6 Steps

1

FoundationUnderstanding Full-Text Search Basics

Concept: Learn what full-text search is and how PostgreSQL stores searchable text.

PostgreSQL uses a special data type called tsvector to store text in a way that makes searching fast. A tsquery is a search phrase that PostgreSQL uses to find matching text. For example, 'cat & dog' means both words must appear.

Result

You can create tsvector columns and search them with tsquery to find matching rows.

Knowing how text is prepared and searched is essential before ranking results by relevance.

2

FoundationUsing ts_rank to Score Matches

3

IntermediateSorting Results by Relevance Score

4

IntermediateApplying Weights to Different Text Parts

5

AdvancedUsing ts_rank_cd for Coverage Density Ranking

6

ExpertPerformance and Indexing with Ranking

Under the Hood

PostgreSQL converts text into tsvector, which stores lexemes (words) and their positions. The tsquery represents search terms. ts_rank compares the query to the tsvector, scoring matches based on frequency, position, and weights. It calculates a floating-point score by summing weighted matches and normalizing by document length.

Why designed this way?

ts_rank was designed to provide a simple, fast way to rank text search results by relevance. It balances accuracy and performance by using preprocessed tsvector data and efficient algorithms. Alternatives like exact phrase matching are slower, so ts_rank offers a practical compromise.

┌─────────────┐       ┌─────────────┐       ┌───────────────┐
│  Input Text │──────▶│  tsvector   │──────▶│ ts_rank Score │
└─────────────┘       └─────────────┘       └───────────────┘
                             ▲                      ▲
                             │                      │
                      ┌─────────────┐       ┌─────────────┐
                      │  tsquery    │──────▶│  Ranking    │
                      └─────────────┘       └─────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does a higher ts_rank score always mean the text is a perfect match? Commit yes or no.

Common Belief:A higher ts_rank score means the text perfectly matches the search query.

Tap to reveal reality

Quick: Is ts_rank computed before or after filtering with @@ operator? Commit your answer.

Common Belief:ts_rank is computed on all rows in the table regardless of matching.

Tap to reveal reality

Quick: Does ts_rank consider word order in ranking? Commit yes or no.

Common Belief:ts_rank always considers the order of words in the text when ranking.

Tap to reveal reality

Quick: Can you use ts_rank without creating a tsvector column? Commit yes or no.

Common Belief:You can use ts_rank directly on plain text columns without preprocessing.

Tap to reveal reality

Expert Zone

1

ts_rank's normalization by document length can cause short documents with few matches to score higher than longer ones with more matches.

2

Weights assigned to lexemes can be customized per language configuration, affecting ranking in multilingual databases.

3

ts_rank_cd's coverage density calculation can be more CPU intensive but yields better relevance when phrase proximity matters.

When NOT to use

Avoid ts_rank when exact phrase matching or semantic search is required; consider extensions like pg_trgm or external tools like Elasticsearch for those cases.

Production Patterns

In production, ts_rank is combined with GIN indexes and weighted tsvector columns (e.g., title weighted higher than body). Queries often limit results by minimum rank thresholds and paginate results for user-friendly search interfaces.

Connections

Information Retrieval

ts_rank implements a basic form of relevance scoring used in information retrieval systems.

Understanding ts_rank connects database search to broader search engine principles like TF-IDF and ranking algorithms.

Weighted Scoring Systems

ts_rank uses weights to prioritize parts of text, similar to weighted scoring in decision making or grading.

Knowing weighted scoring helps grasp how ts_rank balances different text sections for relevance.

Human Attention and Prioritization

Ranking search results mimics how humans prioritize information based on importance and context.

Recognizing this connection helps design better user experiences by aligning search ranking with human focus.

Common Pitfalls

#1Not using @@ operator before ts_rank causes slow queries.

Wrong approach:SELECT ts_rank(to_tsvector(text), to_tsquery('cat & dog')) FROM documents;

Correct approach:SELECT ts_rank(to_tsvector(text), to_tsquery('cat & dog')) FROM documents WHERE to_tsvector(text) @@ to_tsquery('cat & dog');

Root cause:Forgetting that ts_rank should only be computed on matching rows filtered by @@ to avoid scanning all rows.

#2Using ts_rank on plain text without converting to tsvector.

Wrong approach:SELECT ts_rank(text, to_tsquery('cat')) FROM documents WHERE text @@ to_tsquery('cat');

Correct approach:SELECT ts_rank(to_tsvector(text), to_tsquery('cat')) FROM documents WHERE to_tsvector(text) @@ to_tsquery('cat');

Root cause:Misunderstanding that ts_rank requires tsvector inputs, not plain text.

#3Ignoring weights and treating all text parts equally.

Wrong approach:SELECT ts_rank(to_tsvector(title || ' ' || body), query) FROM docs WHERE to_tsvector(title || ' ' || body) @@ query;

Correct approach:SELECT ts_rank(setweight(to_tsvector(title), 'A') || setweight(to_tsvector(body), 'B'), query) FROM docs WHERE (setweight(to_tsvector(title), 'A') || setweight(to_tsvector(body), 'B')) @@ query;

Root cause:Not using setweight leads to less meaningful ranking when some text parts are more important.

Key Takeaways

ts_rank scores how well text matches a search query, enabling sorting by relevance.

It requires text to be converted into tsvector and queries into tsquery before ranking.

Weights can be applied to prioritize important parts of the text for better ranking.

ts_rank_cd is a variant that considers how close query words appear together, improving relevance.

Proper indexing and filtering with @@ are essential for fast and scalable ranked searches.