Bird
Raised Fist0
NLPml~20 mins

Open-domain QA basics in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Open-domain QA Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main difference between open-domain QA and closed-domain QA?

Open-domain QA systems answer questions from any topic using a large knowledge source. Closed-domain QA systems focus on a specific topic or dataset.

Which statement best describes the main difference?

AClosed-domain QA uses large external databases, while open-domain QA only uses small, fixed datasets.
BOpen-domain QA requires no training, but closed-domain QA always requires training on labeled data.
COpen-domain QA can answer questions about any topic, while closed-domain QA is limited to a specific subject area.
DClosed-domain QA systems are always faster than open-domain QA systems.
Attempts:
2 left
💡 Hint

Think about the scope of questions each system can handle.

Predict Output
intermediate
2:00remaining
What is the output of this simple retrieval step in open-domain QA?

Given a list of documents and a query, the code below finds documents containing the query word.

documents = ["The sky is blue.", "Grass is green.", "The sun is bright."]
query = "sky"
retrieved = [doc for doc in documents if query in doc]
print(retrieved)

What is printed?

A["The sky is blue."]
B["Grass is green."]
C["The sun is bright."]
D[]
Attempts:
2 left
💡 Hint

Check which document contains the word 'sky'.

Model Choice
advanced
2:00remaining
Which model architecture is best suited for open-domain QA tasks?

Open-domain QA often requires understanding questions and retrieving answers from large text collections.

Which model architecture is most appropriate?

AA simple linear regression model.
BA convolutional neural network trained on image classification.
CA recurrent neural network trained for speech recognition.
DA sequence-to-sequence transformer model fine-tuned for question answering.
Attempts:
2 left
💡 Hint

Consider models designed for text understanding and generation.

Metrics
advanced
2:00remaining
Which metric is most appropriate to evaluate an open-domain QA system's answer quality?

Open-domain QA systems produce text answers to questions. Which metric below best measures answer correctness?

AAccuracy of classifying images into categories.
BExact Match (EM) score measuring if predicted answer exactly matches the ground truth.
CBLEU score used for machine translation quality.
DMean Squared Error (MSE) between predicted and true numerical values.
Attempts:
2 left
💡 Hint

Think about metrics that compare predicted text answers to correct answers.

🔧 Debug
expert
3:00remaining
Why does this open-domain QA retrieval code return an empty list?

Consider this code snippet for retrieving documents containing a query word:

documents = ["The sky is blue.", "Grass is green.", "The sun is bright."]
query = "Sky"
retrieved = [doc for doc in documents if query in doc]
print(retrieved)

Why is the output an empty list?

ABecause the query 'Sky' has uppercase 'S' but documents contain lowercase 'sky', so the match fails due to case sensitivity.
BBecause the documents list is empty.
CBecause the query word is not a string.
DBecause the list comprehension syntax is incorrect.
Attempts:
2 left
💡 Hint

Check if the query matches the document text exactly including letter case.

Practice

(1/5)
1. What is the main goal of open-domain question answering (QA)?
easy
A. To summarize a single document
B. To translate text from one language to another
C. To find answers to any question from a large collection of texts
D. To generate new text based on a prompt

Solution

  1. Step 1: Understand the definition of open-domain QA

    Open-domain QA aims to answer questions using a wide range of texts, not limited to a specific topic.
  2. Step 2: Compare options with this definition

    Only To find answers to any question from a large collection of texts matches this goal; others describe different NLP tasks.
  3. Final Answer:

    To find answers to any question from a large collection of texts -> Option C
  4. Quick Check:

    Open-domain QA = Finding answers from many texts [OK]
Hint: Open-domain QA means answering questions from many texts [OK]
Common Mistakes:
  • Confusing QA with translation
  • Thinking QA only summarizes text
  • Mixing QA with text generation
2. Which of the following is the correct sequence of steps in an open-domain QA system?
easy
A. Classify questions, then ignore documents
B. Generate answers first, then find documents
C. Summarize documents, then translate answers
D. Retrieve relevant documents, then read and extract answers

Solution

  1. Step 1: Recall the typical open-domain QA pipeline

    It first retrieves relevant documents, then reads them to find answers.
  2. Step 2: Match options to this pipeline

    Only Retrieve relevant documents, then read and extract answers correctly describes this order; others are incorrect or unrelated.
  3. Final Answer:

    Retrieve relevant documents, then read and extract answers -> Option D
  4. Quick Check:

    QA steps = Retrieve then read [OK]
Hint: QA first finds texts, then reads for answers [OK]
Common Mistakes:
  • Thinking answer generation happens before retrieval
  • Confusing summarization with QA
  • Ignoring the retrieval step
3. Given this Python snippet using a pretrained QA model:
from transformers import pipeline
qa = pipeline('question-answering')
context = "The Eiffel Tower is in Paris."
question = "Where is the Eiffel Tower located?"
result = qa(question=question, context=context)
print(result['answer'])
What will be printed?
medium
A. Paris
B. Eiffel Tower
C. question-answering
D. The Eiffel Tower

Solution

  1. Step 1: Understand the QA pipeline usage

    The pipeline takes a question and context, then returns the answer span from the context.
  2. Step 2: Identify the answer span in the context

    The question asks for location; context says "The Eiffel Tower is in Paris." The answer is "Paris".
  3. Final Answer:

    Paris -> Option A
  4. Quick Check:

    Answer extracted = Paris [OK]
Hint: QA model returns the answer span from context [OK]
Common Mistakes:
  • Printing the question instead of answer
  • Confusing the model name with output
  • Selecting the full sentence instead of answer span
4. You have this code snippet for open-domain QA:
from transformers import pipeline
qa = pipeline('question-answering')
context = "Mount Everest is the highest mountain."
question = "What is the highest mountain?"
result = qa(question=question, context=context)
print(result['answer'])
But it raises a KeyError: 'answer'. What is the likely cause?
medium
A. The context is empty
B. The pipeline was not properly initialized for question-answering
C. The question is not a string
D. The print statement is incorrect

Solution

  1. Step 1: Analyze the error KeyError: 'answer'

    This error means the result dictionary does not have the key 'answer'.
  2. Step 2: Check pipeline initialization

    If the pipeline is not correctly set for 'question-answering', the output format differs and lacks 'answer'.
  3. Final Answer:

    The pipeline was not properly initialized for question-answering -> Option B
  4. Quick Check:

    Wrong pipeline type causes missing 'answer' key [OK]
Hint: Ensure pipeline type matches task to get correct keys [OK]
Common Mistakes:
  • Assuming context is empty without checking
  • Ignoring pipeline initialization errors
  • Misreading error as print statement issue
5. You want to improve an open-domain QA system that sometimes returns wrong answers because it reads irrelevant documents. Which approach helps most?
hard
A. Improve the document retrieval step to find more relevant texts
B. Use a smaller pretrained model to speed up reading
C. Remove the retrieval step and read all documents
D. Translate questions to another language before answering

Solution

  1. Step 1: Identify the problem cause

    Wrong answers happen because the system reads irrelevant documents.
  2. Step 2: Choose the best fix

    Improving retrieval to get relevant documents reduces wrong answers effectively.
  3. Final Answer:

    Improve the document retrieval step to find more relevant texts -> Option A
  4. Quick Check:

    Better retrieval = better answer relevance [OK]
Hint: Better retrieval means better answers [OK]
Common Mistakes:
  • Thinking smaller models improve accuracy
  • Removing retrieval causes overload and noise
  • Translating questions doesn't fix relevance