Bird
Raised Fist0
NLPml~8 mins

Information retrieval basics in NLP - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Information retrieval basics
Which metric matters for Information Retrieval and WHY

In information retrieval, the main goal is to find relevant documents from a large collection based on a user's query. The key metrics are Precision and Recall. Precision tells us how many of the retrieved documents are actually relevant. Recall tells us how many of the relevant documents we managed to find. Both matter because we want to find as many relevant documents as possible (high recall) but also avoid showing irrelevant ones (high precision). The F1 score balances precision and recall into one number. Sometimes, Mean Average Precision (MAP) or Normalized Discounted Cumulative Gain (NDCG) are used to measure ranking quality, but precision and recall are the basics.

Confusion Matrix for Information Retrieval
                | Retrieved Relevant | Retrieved Irrelevant |
----------------|--------------------|----------------------|
Relevant Docs   | True Positives (TP) | False Negatives (FN) |
Irrelevant Docs | False Positives (FP)| True Negatives (TN)  |

Example: Suppose we have 100 documents. 30 are relevant to the query. The system retrieves 40 documents, 25 of which are relevant (TP=25), and 15 are irrelevant (FP=15). The system misses 5 relevant documents (FN=5). The rest 55 documents are irrelevant and not retrieved (TN=55).

Precision vs Recall Tradeoff with Examples

Imagine a search engine. If it shows only a few documents that it is very sure about, precision is high but recall is low because many relevant documents are missed. If it shows many documents including less certain ones, recall is high but precision drops because more irrelevant documents appear.

Example 1: Medical research paper search
Recall is more important. Missing a relevant paper could mean missing critical information.

Example 2: Shopping site search
Precision is more important. Showing irrelevant products annoys users.

What Good vs Bad Metric Values Look Like

Good: Precision and recall both above 0.8 means the system finds most relevant documents and keeps irrelevant ones low.

Bad: Precision below 0.5 means many irrelevant documents are shown. Recall below 0.5 means many relevant documents are missed.

F1 score below 0.6 usually indicates poor balance.

Common Pitfalls in Information Retrieval Metrics
  • Accuracy paradox: Accuracy is not useful because most documents are irrelevant, so a system that retrieves nothing can have high accuracy but zero recall.
  • Ignoring ranking: Metrics like precision ignore the order of documents, but users care about top results.
  • Data leakage: Using test queries or documents in training can inflate metrics falsely.
  • Overfitting: Optimizing too much for training queries can reduce generalization to new queries.
Self Check

Your information retrieval system has 98% accuracy but only 12% recall on relevant documents. Is it good for production? Why or why not?

Answer: No, it is not good. The high accuracy is misleading because most documents are irrelevant. The very low recall means the system misses most relevant documents, which defeats the purpose of retrieval.

Key Result
Precision and recall are key metrics in information retrieval to balance finding relevant documents and avoiding irrelevant ones.

Practice

(1/5)
1. What is the main goal of information retrieval in natural language processing?
easy
A. To translate text from one language to another
B. To find relevant documents based on a user's query
C. To generate new text automatically
D. To summarize long documents into short ones

Solution

  1. Step 1: Understand the purpose of information retrieval

    Information retrieval is about searching and finding documents that match a user's query.
  2. Step 2: Compare with other NLP tasks

    Translation, text generation, and summarization are different tasks unrelated to searching documents.
  3. Final Answer:

    To find relevant documents based on a user's query -> Option B
  4. Quick Check:

    Information retrieval = finding relevant documents [OK]
Hint: Remember: retrieval means finding, not creating [OK]
Common Mistakes:
  • Confusing retrieval with translation
  • Thinking retrieval generates new text
  • Mixing retrieval with summarization
2. Which of the following Python code snippets correctly checks if the word 'apple' is in a document string doc (case-insensitive)?
easy
A. if 'Apple' == doc:
B. if doc.contains('apple'):
C. if 'apple' in doc.lower():
D. if doc.find('apple') == -1:

Solution

  1. Step 1: Understand case-insensitive search

    To ignore case, convert the document to lowercase and check if 'apple' is in it.
  2. Step 2: Analyze each option

    if 'apple' in doc.lower(): uses doc.lower() and checks membership correctly. if doc.contains('apple'): uses a non-existent method contains. if 'Apple' == doc: compares whole string, not membership. if doc.find('apple') == -1: checks if find returns -1, which means not found, so logic is reversed.
  3. Final Answer:

    if 'apple' in doc.lower(): -> Option C
  4. Quick Check:

    Use lower() + in for case-insensitive check [OK]
Hint: Use lower() before checking membership [OK]
Common Mistakes:
  • Using non-existent string methods
  • Comparing whole string instead of membership
  • Misinterpreting find() return values
3. Given the following Python code, what will be the output?
documents = ['Apple pie recipe', 'Banana smoothie', 'apple tart']
query = 'apple'
results = [doc for doc in documents if query.lower() in doc.lower()]
print(results)
medium
A. []
B. ['apple tart']
C. ['Apple pie recipe']
D. ['Apple pie recipe', 'apple tart']

Solution

  1. Step 1: Understand the list comprehension filtering

    The code checks each document if the lowercase query 'apple' is in the lowercase document string.
  2. Step 2: Check each document

    'Apple pie recipe' contains 'apple' ignoring case, so included. 'Banana smoothie' does not contain 'apple'. 'apple tart' contains 'apple'. So results are the first and third documents.
  3. Final Answer:

    ['Apple pie recipe', 'apple tart'] -> Option D
  4. Quick Check:

    Case-insensitive filter returns matching docs [OK]
Hint: Check each document with lowercase query and doc [OK]
Common Mistakes:
  • Ignoring case and missing matches
  • Including documents without the query word
  • Confusing list comprehension output
4. The following code is intended to find documents containing the word 'data' (case-insensitive), but it returns an empty list. What is the error?
docs = ['Data science', 'Big Data', 'Machine learning']
query = 'data'
results = [d for d in docs if d.find(query) != -1]
print(results)
medium
A. The find method is case-sensitive, so it misses 'Data science'
B. The find method returns -1 when found, so condition is wrong
C. The list comprehension syntax is incorrect
D. The variable query is not defined

Solution

  1. Step 1: Understand find behavior

    The find method is case-sensitive, so searching 'data' in 'Data science' returns -1 (not found).
  2. Step 2: Identify why results is empty

    The find method is case-sensitive. 'Data science'.find('data') returns -1 because of uppercase 'D'. Similarly, 'Big Data'.find('data') returns -1. 'Machine learning' doesn't contain 'data'. So results is empty.
  3. Final Answer:

    The find method is case-sensitive, so it misses 'Data science' -> Option A
  4. Quick Check:

    find() is case-sensitive [OK]
Hint: Remember find() is case-sensitive; use lower() [OK]
Common Mistakes:
  • Assuming find() ignores case
  • Misunderstanding find() return values
  • Thinking list comprehension syntax is wrong
5. You have a list of documents:
docs = ['Data Science is fun', 'I love machine learning', 'Deep learning and data']

You want to create a dictionary where keys are unique words (case-insensitive) from all documents, and values are lists of document indices where the word appears. Which code snippet correctly does this?
hard
A. word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i)
B. word_docs = {} for i, doc in enumerate(docs): for word in doc.split(): word_docs[word].append(i)
C. word_docs = {word: i for i, doc in enumerate(docs) for word in doc.lower().split()}
D. word_docs = {} for doc in docs: for word in doc.lower().split(): word_docs[word] = doc

Solution

  1. Step 1: Understand the goal

    Create a dictionary mapping each unique lowercase word to a list of document indices where it appears.
  2. Step 2: Analyze each option

    word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i) uses setdefault to initialize lists and appends indices correctly with lowercase words. word_docs = {} for i, doc in enumerate(docs): for word in doc.split(): word_docs[word].append(i) misses initializing lists and ignores case. word_docs = {word: i for i, doc in enumerate(docs) for word in doc.lower().split()} creates a dict with last index only, not lists. word_docs = {} for doc in docs: for word in doc.lower().split(): word_docs[word] = doc overwrites values with document strings, not indices.
  3. Final Answer:

    word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i) -> Option A
  4. Quick Check:

    Use setdefault and lowercase words for correct mapping [OK]
Hint: Use setdefault to build lists for each word [OK]
Common Mistakes:
  • Not initializing lists before appending
  • Ignoring case normalization
  • Overwriting dictionary values instead of appending