Information retrieval helps us find useful information quickly from large collections like the internet or documents.
Information retrieval basics in NLP
class SimpleSearchEngine: def __init__(self, documents): self.documents = documents def search(self, query): results = [] for index, document in enumerate(self.documents): if query.lower() in document.lower(): results.append((index, document)) return results
This class stores documents and searches for a query word inside them.
The search method returns documents containing the query, ignoring case.
documents = ["Apple pie recipe", "Banana smoothie", "Cherry tart"] search_engine = SimpleSearchEngine(documents) results = search_engine.search("apple") print(results)
documents = [] search_engine = SimpleSearchEngine(documents) results = search_engine.search("anything") print(results)
documents = ["Only one document"] search_engine = SimpleSearchEngine(documents) results = search_engine.search("document") print(results)
documents = ["Start here", "Middle part", "End now"] search_engine = SimpleSearchEngine(documents) results = search_engine.search("end") print(results)
This program creates a simple search engine that looks for a word in a list of documents and prints the matching documents with their indexes.
class SimpleSearchEngine: def __init__(self, documents): self.documents = documents def search(self, query): results = [] for index, document in enumerate(self.documents): if query.lower() in document.lower(): results.append((index, document)) return results # Create a list of documents documents = [ "Machine learning basics", "Deep learning introduction", "Natural language processing overview", "Information retrieval techniques", "Data science and AI" ] # Initialize the search engine with documents search_engine = SimpleSearchEngine(documents) # Search for the word 'learning' search_results = search_engine.search("learning") # Print the results print("Search results for 'learning':") for index, doc in search_results: print(f"Document {index}: {doc}")
Time complexity of search is O(n * m) where n is number of documents and m is average document length.
Space complexity is O(n) to store documents.
Common mistake: Not handling case differences can miss matches.
Use this simple search for small collections; for large data, use indexes or specialized tools.
Information retrieval helps find relevant documents from many options.
Simple search checks if query words appear in documents.
Case-insensitive search improves matching results.