Open-domain Question Answering (QA) helps computers find answers to any question from a large collection of texts. It makes information easy to get, like asking a smart assistant.
Open-domain QA basics in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Syntax
NLP
1. Input: A question in natural language. 2. Retrieve: Find relevant documents or passages from a large text collection. 3. Reader: Use a model to read the retrieved text and find the exact answer. 4. Output: Return the answer text.
The process usually has two parts: retrieval and reading.
Models like BERT or GPT can be used as readers to find answers.
Examples
NLP
Question: "Who wrote the book '1984'?" Retrieve: Find Wikipedia page about '1984'. Reader: Extract 'George Orwell' as answer. Output: "George Orwell"
NLP
Question: "What is the capital of France?" Retrieve: Find documents about France. Reader: Extract 'Paris' as answer. Output: "Paris"
Sample Model
This code uses a ready-made model to answer a question from a given text. It prints the answer and confidence score.
NLP
from transformers import pipeline # Load a question-answering pipeline qa = pipeline('question-answering') # Define the question and context question = "Who developed the theory of relativity?" context = ( "Albert Einstein was a physicist who developed the theory of relativity, one of the two pillars of modern physics." ) # Get the answer result = qa(question=question, context=context) print(f"Answer: {result['answer']}") print(f"Score: {result['score']:.2f}")
Important Notes
Open-domain QA needs a large text collection to find answers from.
Retrieval quality affects how good the final answer is.
Pretrained language models help readers understand and extract answers well.
Summary
Open-domain QA finds answers to any question from large texts.
It works by retrieving relevant texts and reading them to find answers.
Pretrained models like BERT make reading and answering easier.
Practice
1. What is the main goal of open-domain question answering (QA)?
easy
Solution
Step 1: Understand the definition of open-domain QA
Open-domain QA aims to answer questions using a wide range of texts, not limited to a specific topic.Step 2: Compare options with this definition
Only To find answers to any question from a large collection of texts matches this goal; others describe different NLP tasks.Final Answer:
To find answers to any question from a large collection of texts -> Option CQuick Check:
Open-domain QA = Finding answers from many texts [OK]
Hint: Open-domain QA means answering questions from many texts [OK]
Common Mistakes:
- Confusing QA with translation
- Thinking QA only summarizes text
- Mixing QA with text generation
2. Which of the following is the correct sequence of steps in an open-domain QA system?
easy
Solution
Step 1: Recall the typical open-domain QA pipeline
It first retrieves relevant documents, then reads them to find answers.Step 2: Match options to this pipeline
Only Retrieve relevant documents, then read and extract answers correctly describes this order; others are incorrect or unrelated.Final Answer:
Retrieve relevant documents, then read and extract answers -> Option DQuick Check:
QA steps = Retrieve then read [OK]
Hint: QA first finds texts, then reads for answers [OK]
Common Mistakes:
- Thinking answer generation happens before retrieval
- Confusing summarization with QA
- Ignoring the retrieval step
3. Given this Python snippet using a pretrained QA model:
from transformers import pipeline
qa = pipeline('question-answering')
context = "The Eiffel Tower is in Paris."
question = "Where is the Eiffel Tower located?"
result = qa(question=question, context=context)
print(result['answer'])
What will be printed?medium
Solution
Step 1: Understand the QA pipeline usage
The pipeline takes a question and context, then returns the answer span from the context.Step 2: Identify the answer span in the context
The question asks for location; context says "The Eiffel Tower is in Paris." The answer is "Paris".Final Answer:
Paris -> Option AQuick Check:
Answer extracted = Paris [OK]
Hint: QA model returns the answer span from context [OK]
Common Mistakes:
- Printing the question instead of answer
- Confusing the model name with output
- Selecting the full sentence instead of answer span
4. You have this code snippet for open-domain QA:
from transformers import pipeline
qa = pipeline('question-answering')
context = "Mount Everest is the highest mountain."
question = "What is the highest mountain?"
result = qa(question=question, context=context)
print(result['answer'])
But it raises a KeyError: 'answer'. What is the likely cause?medium
Solution
Step 1: Analyze the error KeyError: 'answer'
This error means the result dictionary does not have the key 'answer'.Step 2: Check pipeline initialization
If the pipeline is not correctly set for 'question-answering', the output format differs and lacks 'answer'.Final Answer:
The pipeline was not properly initialized for question-answering -> Option BQuick Check:
Wrong pipeline type causes missing 'answer' key [OK]
Hint: Ensure pipeline type matches task to get correct keys [OK]
Common Mistakes:
- Assuming context is empty without checking
- Ignoring pipeline initialization errors
- Misreading error as print statement issue
5. You want to improve an open-domain QA system that sometimes returns wrong answers because it reads irrelevant documents. Which approach helps most?
hard
Solution
Step 1: Identify the problem cause
Wrong answers happen because the system reads irrelevant documents.Step 2: Choose the best fix
Improving retrieval to get relevant documents reduces wrong answers effectively.Final Answer:
Improve the document retrieval step to find more relevant texts -> Option AQuick Check:
Better retrieval = better answer relevance [OK]
Hint: Better retrieval means better answers [OK]
Common Mistakes:
- Thinking smaller models improve accuracy
- Removing retrieval causes overload and noise
- Translating questions doesn't fix relevance
