0
0
LangchainHow-ToBeginner ยท 3 min read

How to Stream Responses in Langchain: Simple Guide

To stream responses in langchain, enable the streaming parameter in your language model setup and provide a callback_manager to handle tokens as they arrive. This allows your app to receive and display output incrementally instead of waiting for the full response.
๐Ÿ“

Syntax

Streaming in Langchain requires setting streaming=True when creating the language model instance. You also need to pass a callback_manager that listens for new tokens and processes them as they stream in.

Key parts:

  • streaming=True: Enables streaming mode.
  • callback_manager: Manages callbacks for token events.
  • on_llm_new_token: Function called for each new token received.
python
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from langchain.callbacks.manager import CallbackManager

class StreamHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(token, end='', flush=True)

handler = StreamHandler()
callback_manager = CallbackManager([handler])
chat = ChatOpenAI(streaming=True, callback_manager=callback_manager)

response = chat.predict('Say hello in a streaming way.')
Output
Say hello in a streaming way.
๐Ÿ’ป

Example

This example shows how to create a streaming chat model with Langchain that prints tokens as they arrive. The StreamHandler class handles each new token by printing it immediately, simulating real-time output.

python
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.base import BaseCallbackHandler
from langchain.callbacks.manager import CallbackManager

class StreamHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(token, end='', flush=True)

handler = StreamHandler()
callback_manager = CallbackManager([handler])
chat = ChatOpenAI(streaming=True, callback_manager=callback_manager)

response = chat.predict('Explain streaming responses simply.')
Output
Explain streaming responses simply.
โš ๏ธ

Common Pitfalls

Common mistakes when streaming responses in Langchain include:

  • Not setting streaming=True, so the model waits to return the full response.
  • Omitting the callback_manager, so no tokens are received incrementally.
  • Using a callback handler that does not implement on_llm_new_token, missing token events.

Always ensure your callback handler properly processes tokens and that streaming is enabled.

python
from langchain.chat_models import ChatOpenAI

# Wrong: streaming disabled, no callbacks
chat = ChatOpenAI()
response = chat.predict('Hello')  # waits for full response

# Right: streaming enabled with callback
from langchain.callbacks.base import BaseCallbackHandler
from langchain.callbacks.manager import CallbackManager

class StreamHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(token, end='', flush=True)

handler = StreamHandler()
callback_manager = CallbackManager([handler])
chat = ChatOpenAI(streaming=True, callback_manager=callback_manager)
response = chat.predict('Hello')
Output
Hello
๐Ÿ“Š

Quick Reference

FeatureDescription
streaming=TrueEnables streaming mode for incremental output
callback_managerHandles events like new tokens during streaming
on_llm_new_token(token)Callback method called for each new token
Print tokens immediatelyUse print with flush=True for real-time display
โœ…

Key Takeaways

Enable streaming by setting streaming=True in your Langchain model.
Use a callback_manager with a handler implementing on_llm_new_token to process tokens as they arrive.
Streaming lets you show output token-by-token instead of waiting for the full response.
Without streaming or callbacks, Langchain returns the full response only after completion.
Print tokens with flush=True to see them appear immediately in your app or console.