0
0
NLPml~5 mins

Information retrieval basics in NLP

Choose your learning style9 modes available
Introduction

Information retrieval helps us find useful information quickly from large collections like the internet or documents.

Searching for a recipe in a large cookbook.
Finding a specific article on a news website.
Looking up a product in an online store.
Finding emails in your inbox by keywords.
Searching for a book in a digital library.
Syntax
NLP
class SimpleSearchEngine:
    def __init__(self, documents):
        self.documents = documents

    def search(self, query):
        results = []
        for index, document in enumerate(self.documents):
            if query.lower() in document.lower():
                results.append((index, document))
        return results

This class stores documents and searches for a query word inside them.

The search method returns documents containing the query, ignoring case.

Examples
Search for 'apple' finds the first document because it contains 'Apple'.
NLP
documents = ["Apple pie recipe", "Banana smoothie", "Cherry tart"]
search_engine = SimpleSearchEngine(documents)
results = search_engine.search("apple")
print(results)
When there are no documents, search returns an empty list.
NLP
documents = []
search_engine = SimpleSearchEngine(documents)
results = search_engine.search("anything")
print(results)
Search works with just one document and finds it if it matches.
NLP
documents = ["Only one document"]
search_engine = SimpleSearchEngine(documents)
results = search_engine.search("document")
print(results)
Search finds documents with the query anywhere in the text.
NLP
documents = ["Start here", "Middle part", "End now"]
search_engine = SimpleSearchEngine(documents)
results = search_engine.search("end")
print(results)
Sample Model

This program creates a simple search engine that looks for a word in a list of documents and prints the matching documents with their indexes.

NLP
class SimpleSearchEngine:
    def __init__(self, documents):
        self.documents = documents

    def search(self, query):
        results = []
        for index, document in enumerate(self.documents):
            if query.lower() in document.lower():
                results.append((index, document))
        return results

# Create a list of documents
documents = [
    "Machine learning basics",
    "Deep learning introduction",
    "Natural language processing overview",
    "Information retrieval techniques",
    "Data science and AI"
]

# Initialize the search engine with documents
search_engine = SimpleSearchEngine(documents)

# Search for the word 'learning'
search_results = search_engine.search("learning")

# Print the results
print("Search results for 'learning':")
for index, doc in search_results:
    print(f"Document {index}: {doc}")
OutputSuccess
Important Notes

Time complexity of search is O(n * m) where n is number of documents and m is average document length.

Space complexity is O(n) to store documents.

Common mistake: Not handling case differences can miss matches.

Use this simple search for small collections; for large data, use indexes or specialized tools.

Summary

Information retrieval helps find relevant documents from many options.

Simple search checks if query words appear in documents.

Case-insensitive search improves matching results.