Performance: Why the RAG chain connects retrieval to generation
MEDIUM IMPACT
This concept affects how quickly and smoothly the system can fetch relevant data and generate responses, impacting user wait time and interaction speed.
async function handleQuery(query) { const docsStream = retrieveDocumentsStream(query); const response = await generateResponseWithStream(docsStream, query); return response; }
async function handleQuery(query) { const docs = await retrieveDocuments(query); const response = await generateResponse('Use these docs: ' + docs.join(', ') + ' to answer: ' + query); return response; }
| Pattern | Data Fetching | Generation Start | User Wait Time | Verdict |
|---|---|---|---|---|
| Sequential retrieval then generation | Full retrieval first | After retrieval completes | High wait time, blocks UI | [X] Bad |
| Streaming retrieval into generation | Partial retrieval streams | Starts immediately with first data | Lower wait time, smoother UI | [OK] Good |