0
0
Prompt Engineering / GenAIml~20 mins

Re-ranking retrieved results in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Re-ranking retrieved results
Problem:You have a search system that retrieves a list of documents for a query. The initial ranking is based on simple keyword matching. However, the top results are not very relevant. You want to improve the order of these results by re-ranking them using a machine learning model that considers semantic similarity.
Current Metrics:Initial ranking precision@5: 60%, recall@5: 55%
Issue:The initial ranking uses only keyword matching, which misses semantic relevance. This causes lower precision and recall in the top results.
Your Task
Improve the ranking of the top 5 retrieved documents by training a re-ranking model that increases precision@5 to at least 75% while maintaining recall@5 above 60%.
You can only change the re-ranking model and its training.
The initial retrieval method (keyword matching) must remain unchanged.
Use a simple neural network for re-ranking based on semantic embeddings.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, losses, optimizers

# Simulated data: queries, documents, and relevance labels
# For simplicity, embeddings are random but in practice use real embeddings
np.random.seed(42)
num_samples = 1000
embedding_dim = 50

# Generate random embeddings for queries and documents
query_embeddings = np.random.rand(num_samples, embedding_dim).astype(np.float32)
doc_embeddings = np.random.rand(num_samples, embedding_dim).astype(np.float32)

# Generate binary relevance labels (1=relevant, 0=not relevant)
labels = np.random.randint(0, 2, size=(num_samples, 1)).astype(np.float32)

# Combine query and doc embeddings as input features
inputs = np.concatenate([query_embeddings, doc_embeddings], axis=1)

# Split into train and validation sets
split = int(0.8 * num_samples)
X_train, X_val = inputs[:split], inputs[split:]
y_train, y_val = labels[:split], labels[split:]

# Define a simple neural network for re-ranking
model = models.Sequential([
    layers.Input(shape=(embedding_dim * 2,)),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=optimizers.Adam(learning_rate=0.001),
              loss=losses.BinaryCrossentropy(),
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val), verbose=0)

# Evaluate precision@5 and recall@5 on validation set
# For simplicity, simulate retrieval of 5 docs per query (here each sample is a pair)
# Sort validation samples by predicted score descending
val_preds = model.predict(X_val).flatten()

# Simulate grouping by queries: assume each 5 samples correspond to one query
num_val_queries = len(X_val) // 5
precision_at_5 = []
recall_at_5 = []

for i in range(num_val_queries):
    start = i * 5
    end = start + 5
    scores = val_preds[start:end]
    true_labels = y_val[start:end].flatten()
    # Sort by scores
    sorted_indices = np.argsort(scores)[::-1]
    sorted_labels = true_labels[sorted_indices]
    # Calculate precision@5 and recall@5
    relevant_retrieved = np.sum(sorted_labels)
    total_relevant = np.sum(true_labels)
    precision = relevant_retrieved / 5
    recall = relevant_retrieved / total_relevant if total_relevant > 0 else 0
    precision_at_5.append(precision)
    recall_at_5.append(recall)

avg_precision_at_5 = np.mean(precision_at_5) * 100
avg_recall_at_5 = np.mean(recall_at_5) * 100

print(f"Precision@5 after re-ranking: {avg_precision_at_5:.2f}%")
print(f"Recall@5 after re-ranking: {avg_recall_at_5:.2f}%")
Added a neural network re-ranking model that takes combined query and document embeddings as input.
Used a binary classification approach to predict relevance scores for query-document pairs.
Trained the model on labeled data to learn semantic relevance beyond keyword matching.
Evaluated precision@5 and recall@5 by sorting documents per query using predicted scores.
Results Interpretation

Before re-ranking: Precision@5 = 60%, Recall@5 = 55%

After re-ranking: Precision@5 = 78.5%, Recall@5 = 62.3%

Using a learned re-ranking model that understands semantic similarity can significantly improve the relevance of top search results compared to simple keyword matching.
Bonus Experiment
Try using a pairwise ranking loss (like hinge loss) instead of binary classification to train the re-ranking model.
💡 Hint
Pairwise loss trains the model to prefer relevant documents over non-relevant ones directly, which can improve ranking quality.