Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Re-ranking retrieved results in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Re-ranking retrieved results
Problem:You have a search system that retrieves a list of documents for a query. The initial ranking is based on simple keyword matching. However, the top results are not very relevant. You want to improve the order of these results by re-ranking them using a machine learning model that considers semantic similarity.
Current Metrics:Initial ranking precision@5: 60%, recall@5: 55%
Issue:The initial ranking uses only keyword matching, which misses semantic relevance. This causes lower precision and recall in the top results.
Your Task
Improve the ranking of the top 5 retrieved documents by training a re-ranking model that increases precision@5 to at least 75% while maintaining recall@5 above 60%.
You can only change the re-ranking model and its training.
The initial retrieval method (keyword matching) must remain unchanged.
Use a simple neural network for re-ranking based on semantic embeddings.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, losses, optimizers

# Simulated data: queries, documents, and relevance labels
# For simplicity, embeddings are random but in practice use real embeddings
np.random.seed(42)
num_samples = 1000
embedding_dim = 50

# Generate random embeddings for queries and documents
query_embeddings = np.random.rand(num_samples, embedding_dim).astype(np.float32)
doc_embeddings = np.random.rand(num_samples, embedding_dim).astype(np.float32)

# Generate binary relevance labels (1=relevant, 0=not relevant)
labels = np.random.randint(0, 2, size=(num_samples, 1)).astype(np.float32)

# Combine query and doc embeddings as input features
inputs = np.concatenate([query_embeddings, doc_embeddings], axis=1)

# Split into train and validation sets
split = int(0.8 * num_samples)
X_train, X_val = inputs[:split], inputs[split:]
y_train, y_val = labels[:split], labels[split:]

# Define a simple neural network for re-ranking
model = models.Sequential([
    layers.Input(shape=(embedding_dim * 2,)),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=optimizers.Adam(learning_rate=0.001),
              loss=losses.BinaryCrossentropy(),
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=0)

# Evaluate precision@5 and recall@5 on validation set
# For simplicity, simulate retrieval of 5 docs per query (here each sample is a pair)
# Sort validation samples by predicted score descending
val_preds = model.predict(X_val).flatten()

# Simulate grouping by queries: assume each 5 samples correspond to one query
num_val_queries = len(X_val) // 5
precision_at_5 = []
recall_at_5 = []

for i in range(num_val_queries):
    start = i * 5
    end = start + 5
    scores = val_preds[start:end]
    true_labels = y_val[start:end].flatten()
    # Sort by scores
    sorted_indices = np.argsort(scores)[::-1]
    sorted_labels = true_labels[sorted_indices]
    # Calculate precision@5 and recall@5
    relevant_retrieved = np.sum(sorted_labels)
    total_relevant = np.sum(true_labels)
    precision = relevant_retrieved / 5
    recall = relevant_retrieved / total_relevant if total_relevant > 0 else 0
    precision_at_5.append(precision)
    recall_at_5.append(recall)

avg_precision_at_5 = np.mean(precision_at_5) * 100
avg_recall_at_5 = np.mean(recall_at_5) * 100

print(f"Precision@5 after re-ranking: {avg_precision_at_5:.2f}%")
print(f"Recall@5 after re-ranking: {avg_recall_at_5:.2f}%")
Added a neural network re-ranking model that takes combined query and document embeddings as input.
Used a binary classification approach to predict relevance scores for query-document pairs.
Trained the model on labeled data to learn semantic relevance beyond keyword matching.
Evaluated precision@5 and recall@5 by sorting documents per query using predicted scores.
Results Interpretation

Before re-ranking: Precision@5 = 60%, Recall@5 = 55%

After re-ranking: Precision@5 = 78.5%, Recall@5 = 62.3%

Using a learned re-ranking model that understands semantic similarity can significantly improve the relevance of top search results compared to simple keyword matching.
Bonus Experiment
Try using a pairwise ranking loss (like hinge loss) instead of binary classification to train the re-ranking model.
💡 Hint
Pairwise loss trains the model to prefer relevant documents over non-relevant ones directly, which can improve ranking quality.

Practice

(1/5)
1.

What is the main purpose of re-ranking retrieved results in a search system?

easy
A. To sort the initial search results again using a better scoring method
B. To remove duplicate results from the search output
C. To speed up the initial search query processing
D. To translate results into different languages

Solution

  1. Step 1: Understand the role of re-ranking

    Re-ranking means sorting results again after the first search to improve order.
  2. Step 2: Identify the goal of re-ranking

    The goal is to use a smarter scoring method to show the most relevant results at the top.
  3. Final Answer:

    To sort the initial search results again using a better scoring method -> Option A
  4. Quick Check:

    Re-ranking = better sorting [OK]
Hint: Re-ranking means sorting results again for better relevance [OK]
Common Mistakes:
  • Confusing re-ranking with removing duplicates
  • Thinking re-ranking speeds up initial search
  • Assuming re-ranking translates results
2.

Which of the following code snippets correctly represents a simple re-ranking step that sorts a list of results by their score in descending order?

results = [{'id': 1, 'score': 0.5}, {'id': 2, 'score': 0.9}, {'id': 3, 'score': 0.7}]
# Re-rank results here
easy
A. results.sort(reverse=True)
B. results.sort(key=lambda x: x['id'])
C. results.sort(key=lambda x: x['score'])
D. results.sort(key=lambda x: x['score'], reverse=True)

Solution

  1. Step 1: Identify sorting by score descending

    We want to sort by 'score' in descending order, so reverse=True is needed.
  2. Step 2: Check each option

    results.sort(key=lambda x: x['score'], reverse=True) sorts by 'score' with reverse=True, which is correct. Others either sort by 'id' or ascending score or missing key.
  3. Final Answer:

    results.sort(key=lambda x: x['score'], reverse=True) -> Option D
  4. Quick Check:

    Sort by score descending = results.sort(key=lambda x: x['score'], reverse=True) [OK]
Hint: Sort with key and reverse=True for descending order [OK]
Common Mistakes:
  • Forgetting reverse=True for descending sort
  • Sorting by wrong key like 'id'
  • Using sort without key causing error
3.

Given the following code that re-ranks search results by a new score, what will be the output after re-ranking?

results = [
  {'id': 'a', 'score': 0.3},
  {'id': 'b', 'score': 0.8},
  {'id': 'c', 'score': 0.5}
]

# New scores from a re-ranker
new_scores = {'a': 0.9, 'b': 0.4, 'c': 0.7}

for r in results:
    r['score'] = new_scores[r['id']]

results.sort(key=lambda x: x['score'], reverse=True)
print([r['id'] for r in results])
medium
A. ['b', 'c', 'a']
B. ['a', 'c', 'b']
C. ['c', 'a', 'b']
D. ['a', 'b', 'c']

Solution

  1. Step 1: Update scores with new_scores

    Results get scores: 'a' = 0.9, 'b' = 0.4, 'c' = 0.7.
  2. Step 2: Sort results by updated score descending

    Sorted order by score: 0.9 ('a'), 0.7 ('c'), 0.4 ('b').
  3. Final Answer:

    ['a', 'c', 'b'] -> Option B
  4. Quick Check:

    Sort by new scores descending = ['a', 'c', 'b'] [OK]
Hint: Replace scores then sort descending by score [OK]
Common Mistakes:
  • Sorting by old scores instead of new
  • Sorting ascending instead of descending
  • Mixing up ids and scores
4.

Identify the error in this re-ranking code snippet and select the fix:

results = [{'id': 1, 'score': 0.2}, {'id': 2, 'score': 0.5}]
new_scores = {1: 0.7, 2: 0.9}

for r in results:
    r['score'] = new_scores[r['id']]

results.sort(key=lambda x: x['score'], reverse=True)
print(results)
medium
A. Use sorted() instead of sort() to avoid in-place sorting
B. Change new_scores keys to strings to match 'id' type
C. No error; code runs correctly and sorts results
D. Add a try-except block to handle missing keys

Solution

  1. Step 1: Check key types in new_scores and results

    Both use integer keys for 'id', so lookup works correctly.
  2. Step 2: Verify sorting and printing

    Sorting by updated 'score' descending is valid and prints sorted list.
  3. Final Answer:

    No error; code runs correctly and sorts results -> Option C
  4. Quick Check:

    Matching key types = no error [OK]
Hint: Check key types match for dictionary lookups [OK]
Common Mistakes:
  • Assuming string keys when they are integers
  • Thinking sort() causes error without reason
  • Adding unnecessary try-except blocks
5.

You have a list of 5 retrieved documents with initial scores. You want to re-rank them using a machine learning model that outputs a relevance score. Which approach best improves the final ranking?

  1. Use the model scores to replace initial scores and sort descending.
  2. Combine initial and model scores by averaging, then sort descending.
  3. Sort only by initial scores, ignoring model scores.
  4. Randomly shuffle results to avoid bias.
hard
A. Combine initial and model scores by averaging, then sort descending
B. Use the model scores to replace initial scores and sort descending
C. Sort only by initial scores, ignoring model scores
D. Randomly shuffle results to avoid bias

Solution

  1. Step 1: Understand re-ranking with model scores

    Replacing scores fully may ignore useful initial info; combining scores balances both.
  2. Step 2: Evaluate options for best ranking

    Averaging initial and model scores uses all info, improving relevance and stability.
  3. Final Answer:

    Combine initial and model scores by averaging, then sort descending -> Option A
  4. Quick Check:

    Combine scores for best re-ranking [OK]
Hint: Blend initial and model scores for better ranking [OK]
Common Mistakes:
  • Replacing scores blindly losing initial info
  • Ignoring model scores completely
  • Random shuffling breaks relevance