0
0
LangchainHow-ToBeginner ยท 3 min read

How to Stream Tokens in Langchain: Simple Guide

To stream tokens in langchain, enable streaming in the language model by setting streaming=True and provide a callback_manager with a handler like StreamingStdOutCallbackHandler. This setup sends tokens as they are generated, allowing real-time output display.
๐Ÿ“

Syntax

Streaming tokens in Langchain requires configuring the language model with streaming enabled and a callback manager to handle token events.

  • streaming=True: Turns on token streaming.
  • callback_manager: Manages callbacks for streaming events.
  • StreamingStdOutCallbackHandler(): A built-in handler that prints tokens as they stream.
python
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
chat = ChatOpenAI(streaming=True, callback_manager=callback_manager)

response = chat.predict("Say hello in a streaming way.")
๐Ÿ’ป

Example

This example shows how to stream tokens from OpenAI's chat model using Langchain. Tokens will print to the console as they arrive.

python
from langchain.chat_models import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager

# Create a callback manager with the streaming stdout handler
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Initialize ChatOpenAI with streaming enabled
chat = ChatOpenAI(streaming=True, callback_manager=callback_manager)

# Send a prompt and stream the response tokens
response = chat.predict("Tell me a short joke.")

print(f"\nFull response: {response}")
Output
Why did the scarecrow win an award? Because he was outstanding in his field! Full response: Why did the scarecrow win an award? Because he was outstanding in his field!
โš ๏ธ

Common Pitfalls

Common mistakes when streaming tokens in Langchain include:

  • Not setting streaming=True, so tokens are not streamed.
  • Omitting the callback_manager, so no handler receives token events.
  • Using a callback handler that does not support streaming output.
  • Expecting the full response immediately instead of handling streamed tokens.

Always pair streaming=True with a proper callback manager and handler.

python
from langchain.chat_models import ChatOpenAI

# Wrong: streaming enabled but no callback manager
chat_wrong = ChatOpenAI(streaming=True)
response_wrong = chat_wrong.predict("Hello")  # This will not stream tokens

# Right: streaming with callback manager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
chat_right = ChatOpenAI(streaming=True, callback_manager=callback_manager)
response_right = chat_right.predict("Hello")  # Tokens stream to stdout
๐Ÿ“Š

Quick Reference

ParameterDescriptionExample
streamingEnable token streamingstreaming=True
callback_managerHandles streaming token eventsCallbackManager([StreamingStdOutCallbackHandler()])
StreamingStdOutCallbackHandlerPrints tokens as they streamStreamingStdOutCallbackHandler()
predict()Send prompt and receive streamed tokenschat.predict("Hello")
โœ…

Key Takeaways

Enable streaming by setting streaming=True in your language model.
Use a callback manager with a streaming handler to receive tokens as they generate.
StreamingStdOutCallbackHandler is a simple way to print tokens live to the console.
Without a callback manager, streaming tokens will not be output in real time.
Streaming allows your app to show partial results immediately, improving user experience.