Model Pipeline - Information retrieval basics
This pipeline shows how a simple information retrieval system works. It takes user queries, processes them, finds matching documents, and ranks them to show the best results.
Jump into concepts and practice - no test required
This pipeline shows how a simple information retrieval system works. It takes user queries, processes them, finds matching documents, and ranks them to show the best results.
Loss
0.7 |****
0.6 |****
0.5 |***
0.4 |**
0.3 |*
+------------
1 2 3 4 5 Epochs
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 0.65 | 0.55 | Initial retrieval model with random weights |
| 2 | 0.50 | 0.65 | Model learns better word importance |
| 3 | 0.40 | 0.75 | Improved ranking with term weighting |
| 4 | 0.35 | 0.80 | Model converges with stable ranking |
| 5 | 0.33 | 0.82 | Final fine-tuning of retrieval weights |
doc (case-insensitive)?doc.lower() and checks membership correctly. if doc.contains('apple'): uses a non-existent method contains. if 'Apple' == doc: compares whole string, not membership. if doc.find('apple') == -1: checks if find returns -1, which means not found, so logic is reversed.lower() + in for case-insensitive check [OK]lower() before checking membership [OK]find() return valuesdocuments = ['Apple pie recipe', 'Banana smoothie', 'apple tart'] query = 'apple' results = [doc for doc in documents if query.lower() in doc.lower()] print(results)
docs = ['Data science', 'Big Data', 'Machine learning'] query = 'data' results = [d for d in docs if d.find(query) != -1] print(results)
find behaviorfind method is case-sensitive, so searching 'data' in 'Data science' returns -1 (not found).find method is case-sensitive. 'Data science'.find('data') returns -1 because of uppercase 'D'. Similarly, 'Big Data'.find('data') returns -1. 'Machine learning' doesn't contain 'data'. So results is empty.find method is case-sensitive, so it misses 'Data science' -> Option Afind() is case-sensitive [OK]find() is case-sensitive; use lower() [OK]find() ignores casefind() return valuesdocs = ['Data Science is fun', 'I love machine learning', 'Deep learning and data']
setdefault to initialize lists and appends indices correctly with lowercase words. word_docs = {}
for i, doc in enumerate(docs):
for word in doc.split():
word_docs[word].append(i) misses initializing lists and ignores case. word_docs = {word: i for i, doc in enumerate(docs) for word in doc.lower().split()} creates a dict with last index only, not lists. word_docs = {}
for doc in docs:
for word in doc.lower().split():
word_docs[word] = doc overwrites values with document strings, not indices.setdefault and lowercase words for correct mapping [OK]setdefault to build lists for each word [OK]