Performance: Streaming in production
Streaming affects how quickly users see partial results and how smoothly the UI updates during data processing.
Jump into concepts and practice - no test required
const stream = langchain.stream({ input: userInput }); stream.on('data', chunk => updateUI(chunk));const response = await langchain.call({ input: userInput }); display(response);
| Pattern | DOM Operations | Reflows | Paint Cost | Verdict |
|---|---|---|---|---|
| Full response wait | Single large DOM update | 1 reflow after full data | High paint cost at once | [X] Bad |
| Streaming chunks | Multiple small DOM updates | Multiple reflows but smaller | Lower paint cost per chunk | [OK] Good |
streaming=True in LangChain do?from langchain.callbacks.base import BaseCallbackHandler
class PrintTokens(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs):
print(token, end='')
llm = OpenAI(streaming=True, callbacks=[PrintTokens()])
llm('Hello world')llm = OpenAI(streaming=True, callbacks=PrintTokens())
llm('Test')