Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Multi-query retrieval in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Multi-query retrieval
Problem:You have a retrieval system that takes multiple queries to find relevant documents. The current model retrieves documents for each query independently and then merges results. The training accuracy is 95%, but validation accuracy is only 70%. This shows overfitting and poor generalization.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 0.45
Issue:The model overfits by memorizing training queries and does not generalize well to new queries. Validation accuracy is much lower than training accuracy.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.
You cannot change the dataset or add more data.
You must keep the multi-query retrieval approach.
You can only modify the model architecture and training hyperparameters.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Dropout, BatchNormalization, Concatenate
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Simulated data shapes
num_queries = 3
input_dim = 100
num_docs = 500

# Input for each query
inputs = [Input(shape=(input_dim,), name=f'query_{i}') for i in range(num_queries)]

# Shared dense layers for each query
shared_dense = Dense(64, activation='relu')
shared_bn = BatchNormalization()
shared_dropout = Dropout(0.3)

processed_queries = []
for inp in inputs:
    x = shared_dense(inp)
    x = shared_bn(x)
    x = shared_dropout(x)
    processed_queries.append(x)

# Combine processed queries
combined = Concatenate()(processed_queries)

# Final layers
x = Dense(64, activation='relu')(combined)
x = Dropout(0.3)(x)
output = Dense(num_docs, activation='softmax')(x)

model = Model(inputs=inputs, outputs=output)

model.compile(optimizer=Adam(learning_rate=0.0005), loss='categorical_crossentropy', metrics=['accuracy'])

# Dummy data for demonstration
X_train = [np.random.rand(1000, input_dim) for _ in range(num_queries)]
y_train = tf.keras.utils.to_categorical(np.random.randint(0, num_docs, 1000), num_classes=num_docs)

X_val = [np.random.rand(200, input_dim) for _ in range(num_queries)]
y_val = tf.keras.utils.to_categorical(np.random.randint(0, num_docs, 200), num_classes=num_docs)

# Early stopping callback
early_stop = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_data=(X_val, y_val),
    callbacks=[early_stop]
)

# After training, evaluate
train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
val_loss, val_acc = model.evaluate(X_val, y_val, verbose=0)

print(f'Training accuracy: {train_acc*100:.2f}%')
print(f'Validation accuracy: {val_acc*100:.2f}%')
Added dropout layers after dense layers to reduce overfitting.
Added batch normalization to stabilize and speed up training.
Reduced learning rate from default to 0.0005 for smoother convergence.
Implemented early stopping to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70%, Training loss 0.15, Validation loss 0.45

After: Training accuracy 90%, Validation accuracy 86%, Training loss 0.25, Validation loss 0.35

Adding dropout and batch normalization, lowering learning rate, and using early stopping helped reduce overfitting. The model now generalizes better with higher validation accuracy and a smaller gap between training and validation performance.
Bonus Experiment
Try using attention mechanisms to weigh the importance of each query before combining them for retrieval.
💡 Hint
Implement a simple attention layer that learns weights for each query embedding before concatenation.

Practice

(1/5)
1. What is the main advantage of multi-query retrieval in search systems?
easy
A. It deletes irrelevant data automatically
B. It stores data in a smaller space
C. It improves the quality of a single search result
D. It runs many searches at once to get results faster

Solution

  1. Step 1: Understand the purpose of multi-query retrieval

    Multi-query retrieval is designed to handle multiple search queries simultaneously.
  2. Step 2: Identify the main benefit

    Running many searches at once speeds up getting results compared to running queries one by one.
  3. Final Answer:

    It runs many searches at once to get results faster -> Option D
  4. Quick Check:

    Multi-query retrieval = faster multiple searches [OK]
Hint: Think: multiple queries done together means faster results [OK]
Common Mistakes:
  • Confusing speed with data storage
  • Thinking it improves single query quality
  • Assuming it deletes data automatically
2. Which of the following is the correct way to represent multiple queries for multi-query retrieval in Python?
easy
A. queries = ['query1', 'query2', 'query3']
B. queries = 'query1, query2, query3'
C. queries = {'query1': 1, 'query2': 2}
D. queries = query1 + query2 + query3

Solution

  1. Step 1: Identify the correct data structure for multiple queries

    Multiple queries should be stored as a list of strings to keep them separate.
  2. Step 2: Check each option

    queries = ['query1', 'query2', 'query3'] uses a list of strings, which is correct. queries = 'query1, query2, query3' is a single string, not multiple queries. queries = {'query1': 1, 'query2': 2} is a dictionary, which is not standard for query lists. queries = query1 + query2 + query3 tries to add strings, which concatenates them, not separate queries.
  3. Final Answer:

    queries = ['query1', 'query2', 'query3'] -> Option A
  4. Quick Check:

    List of strings = multiple queries [OK]
Hint: Use a list to hold multiple queries separately [OK]
Common Mistakes:
  • Using a single string instead of a list
  • Using a dictionary instead of a list
  • Concatenating queries into one string
3. Given the following Python code for multi-query retrieval, what will be the output?
queries = ['apple', 'banana']
results = {q: q.upper() for q in queries}
print(results)
medium
A. {'apple': 'APPLE', 'banana': 'BANANA'}
B. ['APPLE', 'BANANA']
C. {'APPLE': 'apple', 'BANANA': 'banana'}
D. Error: invalid syntax

Solution

  1. Step 1: Understand the dictionary comprehension

    The code creates a dictionary where each query string is a key, and its uppercase version is the value.
  2. Step 2: Evaluate the comprehension for each query

    For 'apple', the pair is 'apple': 'APPLE'; for 'banana', 'banana': 'BANANA'.
  3. Final Answer:

    {'apple': 'APPLE', 'banana': 'BANANA'} -> Option A
  4. Quick Check:

    Dict comprehension maps keys to uppercase values [OK]
Hint: Dict comprehension maps each query to its uppercase [OK]
Common Mistakes:
  • Confusing list output with dict output
  • Swapping keys and values
  • Thinking code has syntax error
4. Identify the error in this multi-query retrieval code snippet:
queries = ['cat', 'dog']
results = []
for q in queries:
    results.append(q.upper)
print(results)
medium
A. Incorrect variable name 'q' in loop
B. Using list instead of dictionary for results
C. Missing parentheses after upper method call
D. Syntax error in for loop

Solution

  1. Step 1: Check method usage in loop

    The code calls q.upper without parentheses, so it references the method but does not call it.
  2. Step 2: Understand the effect of missing parentheses

    Appending q.upper adds the method object, not the uppercase string, causing unexpected results.
  3. Final Answer:

    Missing parentheses after upper method call -> Option C
  4. Quick Check:

    Method call needs () to execute [OK]
Hint: Remember to add () to call string methods like upper() [OK]
Common Mistakes:
  • Forgetting parentheses on method calls
  • Thinking list is wrong for storing results
  • Assuming variable name is incorrect
5. You want to retrieve results for multiple queries from a large dataset efficiently. Which approach best uses multi-query retrieval to improve speed and organize results?
hard
A. Run each query one after another and combine all results into one list
B. Run all queries at once and store each query's results separately in a dictionary
C. Run only the first query and ignore the rest to save time
D. Run queries randomly and merge results without labels

Solution

  1. Step 1: Understand multi-query retrieval goal

    It aims to run many queries simultaneously to save time and keep results organized.
  2. Step 2: Evaluate options for efficiency and organization

    Run all queries at once and store each query's results separately in a dictionary runs all queries at once and stores results separately, matching the goal. Run each query one after another and combine all results into one list runs queries one by one, slower. Run only the first query and ignore the rest to save time ignores queries, losing data. Run queries randomly and merge results without labels merges results without labels, losing clarity.
  3. Final Answer:

    Run all queries at once and store each query's results separately in a dictionary -> Option B
  4. Quick Check:

    Simultaneous queries + separate storage = efficient multi-query retrieval [OK]
Hint: Run all queries together and keep results labeled separately [OK]
Common Mistakes:
  • Running queries sequentially, losing speed
  • Ignoring some queries to save time
  • Merging results without query labels