0
0
Prompt Engineering / GenAIml~20 mins

Multimodal RAG in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Multimodal RAG Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main advantage of using Multimodal Retrieval-Augmented Generation (RAG)?

Imagine you have a smart assistant that can answer questions using both text and images. What is the key benefit of combining multiple types of data (like text and images) in a RAG system?

AIt reduces the need for training data by using only images.
BIt makes the system faster by ignoring irrelevant data types.
CIt allows the system to understand and generate answers using richer information from different data types.
DIt limits the system to only text-based answers for simplicity.
Attempts:
2 left
💡 Hint

Think about how combining different senses helps humans understand better.

Model Choice
intermediate
2:00remaining
Which model architecture is best suited for encoding both images and text in a Multimodal RAG system?

You want to build a Multimodal RAG system that can understand images and text together. Which model architecture should you choose to encode both types of data effectively?

AA recurrent neural network (RNN) trained on text sequences only.
BA single text-only transformer model trained on text captions of images.
CA convolutional neural network (CNN) trained only on images without text input.
DA dual-encoder model with separate encoders for images and text that produce embeddings in the same space.
Attempts:
2 left
💡 Hint

Think about how to represent different data types in a way that they can be compared or combined.

Predict Output
advanced
2:00remaining
What is the output of this embedding similarity code snippet?

Given the following Python code that computes cosine similarity between image and text embeddings, what is the printed output?

Prompt Engineering / GenAI
import numpy as np
from numpy.linalg import norm

image_embedding = np.array([0.6, 0.8])
text_embedding = np.array([0.9, 0.1])

cosine_similarity = np.dot(image_embedding, text_embedding) / (norm(image_embedding) * norm(text_embedding))
print(round(cosine_similarity, 2))
A0.68
B0.75
C0.80
D0.50
Attempts:
2 left
💡 Hint

Recall cosine similarity formula: dot product divided by product of norms.

Metrics
advanced
2:00remaining
Which metric best evaluates the retrieval quality in a Multimodal RAG system?

You want to measure how well your Multimodal RAG system retrieves relevant documents (text or images) for a query. Which metric should you use?

ARecall@K, which measures if the correct item is in the top K retrieved results.
BAccuracy of classification labels on a test set.
CBLEU score comparing generated text to reference text.
DMean Squared Error (MSE) between embeddings.
Attempts:
2 left
💡 Hint

Think about how to check if the system finds the right items among its top guesses.

🔧 Debug
expert
3:00remaining
Why does this Multimodal RAG system fail to retrieve relevant images?

Consider this simplified retrieval code snippet for a Multimodal RAG system. Why does it fail to retrieve relevant images?

Prompt Engineering / GenAI
def retrieve(query_embedding, image_embeddings):
    # Returns index of image with max dot product similarity
    similarities = [sum(q * i for q, i in zip(query_embedding, img)) for img in image_embeddings]
    return similarities.index(max(similarities))

query = [0.5, 0.5]
images = [[0.6, 0.8], [0.9, 0.1], [0.1, 0.9]]
result = retrieve(query, images)
print(result)
AThe code incorrectly returns the minimum similarity index instead of maximum.
BThe code uses dot product without normalizing embeddings, causing incorrect similarity ranking.
CThe code uses sum instead of product in similarity calculation, causing a TypeError.
DThe query embedding has wrong dimensions compared to image embeddings.
Attempts:
2 left
💡 Hint

Think about how cosine similarity differs from dot product and why normalization matters.