Information retrieval helps us find useful information quickly from large collections like the internet or documents.
Information retrieval basics in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
class SimpleSearchEngine: def __init__(self, documents): self.documents = documents def search(self, query): results = [] for index, document in enumerate(self.documents): if query.lower() in document.lower(): results.append((index, document)) return results
This class stores documents and searches for a query word inside them.
The search method returns documents containing the query, ignoring case.
documents = ["Apple pie recipe", "Banana smoothie", "Cherry tart"] search_engine = SimpleSearchEngine(documents) results = search_engine.search("apple") print(results)
documents = [] search_engine = SimpleSearchEngine(documents) results = search_engine.search("anything") print(results)
documents = ["Only one document"] search_engine = SimpleSearchEngine(documents) results = search_engine.search("document") print(results)
documents = ["Start here", "Middle part", "End now"] search_engine = SimpleSearchEngine(documents) results = search_engine.search("end") print(results)
This program creates a simple search engine that looks for a word in a list of documents and prints the matching documents with their indexes.
class SimpleSearchEngine: def __init__(self, documents): self.documents = documents def search(self, query): results = [] for index, document in enumerate(self.documents): if query.lower() in document.lower(): results.append((index, document)) return results # Create a list of documents documents = [ "Machine learning basics", "Deep learning introduction", "Natural language processing overview", "Information retrieval techniques", "Data science and AI" ] # Initialize the search engine with documents search_engine = SimpleSearchEngine(documents) # Search for the word 'learning' search_results = search_engine.search("learning") # Print the results print("Search results for 'learning':") for index, doc in search_results: print(f"Document {index}: {doc}")
Time complexity of search is O(n * m) where n is number of documents and m is average document length.
Space complexity is O(n) to store documents.
Common mistake: Not handling case differences can miss matches.
Use this simple search for small collections; for large data, use indexes or specialized tools.
Information retrieval helps find relevant documents from many options.
Simple search checks if query words appear in documents.
Case-insensitive search improves matching results.
Practice
Solution
Step 1: Understand the purpose of information retrieval
Information retrieval is about searching and finding documents that match a user's query.Step 2: Compare with other NLP tasks
Translation, text generation, and summarization are different tasks unrelated to searching documents.Final Answer:
To find relevant documents based on a user's query -> Option BQuick Check:
Information retrieval = finding relevant documents [OK]
- Confusing retrieval with translation
- Thinking retrieval generates new text
- Mixing retrieval with summarization
doc (case-insensitive)?Solution
Step 1: Understand case-insensitive search
To ignore case, convert the document to lowercase and check if 'apple' is in it.Step 2: Analyze each option
if 'apple' in doc.lower(): usesdoc.lower()and checks membership correctly. if doc.contains('apple'): uses a non-existent methodcontains. if 'Apple' == doc: compares whole string, not membership. if doc.find('apple') == -1: checks iffindreturns -1, which means not found, so logic is reversed.Final Answer:
if 'apple' in doc.lower(): -> Option CQuick Check:
Uselower()+infor case-insensitive check [OK]
lower() before checking membership [OK]- Using non-existent string methods
- Comparing whole string instead of membership
- Misinterpreting
find()return values
documents = ['Apple pie recipe', 'Banana smoothie', 'apple tart'] query = 'apple' results = [doc for doc in documents if query.lower() in doc.lower()] print(results)
Solution
Step 1: Understand the list comprehension filtering
The code checks each document if the lowercase query 'apple' is in the lowercase document string.Step 2: Check each document
'Apple pie recipe' contains 'apple' ignoring case, so included. 'Banana smoothie' does not contain 'apple'. 'apple tart' contains 'apple'. So results are the first and third documents.Final Answer:
['Apple pie recipe', 'apple tart'] -> Option DQuick Check:
Case-insensitive filter returns matching docs [OK]
- Ignoring case and missing matches
- Including documents without the query word
- Confusing list comprehension output
docs = ['Data science', 'Big Data', 'Machine learning'] query = 'data' results = [d for d in docs if d.find(query) != -1] print(results)
Solution
Step 1: Understand
Thefindbehaviorfindmethod is case-sensitive, so searching 'data' in 'Data science' returns -1 (not found).Step 2: Identify why results is empty
Thefindmethod is case-sensitive. 'Data science'.find('data') returns -1 because of uppercase 'D'. Similarly, 'Big Data'.find('data') returns -1. 'Machine learning' doesn't contain 'data'. So results is empty.Final Answer:
Thefindmethod is case-sensitive, so it misses 'Data science' -> Option AQuick Check:
find()is case-sensitive [OK]
find() is case-sensitive; use lower() [OK]- Assuming
find()ignores case - Misunderstanding
find()return values - Thinking list comprehension syntax is wrong
docs = ['Data Science is fun', 'I love machine learning', 'Deep learning and data']
You want to create a dictionary where keys are unique words (case-insensitive) from all documents, and values are lists of document indices where the word appears. Which code snippet correctly does this?
Solution
Step 1: Understand the goal
Create a dictionary mapping each unique lowercase word to a list of document indices where it appears.Step 2: Analyze each option
word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i) usessetdefaultto initialize lists and appends indices correctly with lowercase words. word_docs = {} for i, doc in enumerate(docs): for word in doc.split(): word_docs[word].append(i) misses initializing lists and ignores case. word_docs = {word: i for i, doc in enumerate(docs) for word in doc.lower().split()} creates a dict with last index only, not lists. word_docs = {} for doc in docs: for word in doc.lower().split(): word_docs[word] = doc overwrites values with document strings, not indices.Final Answer:
word_docs = {} for i, doc in enumerate(docs): for word in doc.lower().split(): word_docs.setdefault(word, []).append(i) -> Option AQuick Check:
Usesetdefaultand lowercase words for correct mapping [OK]
setdefault to build lists for each word [OK]- Not initializing lists before appending
- Ignoring case normalization
- Overwriting dictionary values instead of appending
