Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to load a pretrained multimodal retriever model.
Prompt Engineering / GenAI
from transformers import [1] retriever = [1].from_pretrained('multimodal-rag-base')
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using RagTokenizer instead of RagRetriever.
Using RagConfig which is only for configuration.
✗ Incorrect
The RagRetriever class is used to load a pretrained retriever model for RAG architectures.
2fill in blank
mediumComplete the code to encode an image input for the multimodal retriever.
Prompt Engineering / GenAI
from PIL import Image image = Image.open('input.jpg') inputs = retriever.image_encoder.[1](image, return_tensors='pt')
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using encode which is generic and may not exist.
Using forward directly without preprocessing.
✗ Incorrect
The method encode_image is used to convert images into embeddings for retrieval.
3fill in blank
hardFix the error in the code to retrieve documents using the multimodal retriever.
Prompt Engineering / GenAI
retrieved_docs = retriever.[1](inputs['pixel_values'], top_k=5)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using retrieve which is not a method.
Using retrieve_docs which does not exist.
✗ Incorrect
The correct method to get relevant documents from the retriever is get_relevant_documents.
4fill in blank
hardFill both blanks to create a multimodal RAG model and generate an answer.
Prompt Engineering / GenAI
from transformers import RagSequenceForGeneration model = RagSequenceForGeneration.from_pretrained('multimodal-rag-base') outputs = model.generate([1], [2]=retrieved_docs)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing retrieved_docs as a positional argument instead of context_input_ids.
Using wrong keyword arguments.
✗ Incorrect
The generate method requires input_ids and context_input_ids to provide the question and retrieved documents.
5fill in blank
hardFill all three blanks to prepare inputs, retrieve documents, and generate answers in a multimodal RAG pipeline.
Prompt Engineering / GenAI
question = 'What is shown in the image?' inputs = retriever.question_encoder.tokenizer(question, return_tensors='pt') image_inputs = retriever.image_encoder.[1](image, return_tensors='pt') retrieved_docs = retriever.get_relevant_documents([2]) outputs = model.generate(input_ids=inputs['input_ids'], [3]=retrieved_docs)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the whole image_inputs dict instead of pixel_values.
Using input_ids instead of context_input_ids for retrieved docs.
✗ Incorrect
First, encode the image with encode_image, then pass pixel_values to get_relevant_documents, and finally use context_input_ids to generate answers.