How to Stream Tokens in Langchain: Simple Guide
To stream tokens in
langchain, enable streaming in the language model by setting streaming=True and provide a callback_manager with a handler like StreamingStdOutCallbackHandler. This setup sends tokens as they are generated, allowing real-time output display.Syntax
Streaming tokens in Langchain requires configuring the language model with streaming enabled and a callback manager to handle token events.
streaming=True: Turns on token streaming.callback_manager: Manages callbacks for streaming events.StreamingStdOutCallbackHandler(): A built-in handler that prints tokens as they stream.
python
from langchain.chat_models import ChatOpenAI from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.callbacks.manager import CallbackManager callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) chat = ChatOpenAI(streaming=True, callback_manager=callback_manager) response = chat.predict("Say hello in a streaming way.")
Example
This example shows how to stream tokens from OpenAI's chat model using Langchain. Tokens will print to the console as they arrive.
python
from langchain.chat_models import ChatOpenAI from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.callbacks.manager import CallbackManager # Create a callback manager with the streaming stdout handler callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) # Initialize ChatOpenAI with streaming enabled chat = ChatOpenAI(streaming=True, callback_manager=callback_manager) # Send a prompt and stream the response tokens response = chat.predict("Tell me a short joke.") print(f"\nFull response: {response}")
Output
Why did the scarecrow win an award? Because he was outstanding in his field!
Full response: Why did the scarecrow win an award? Because he was outstanding in his field!
Common Pitfalls
Common mistakes when streaming tokens in Langchain include:
- Not setting
streaming=True, so tokens are not streamed. - Omitting the
callback_manager, so no handler receives token events. - Using a callback handler that does not support streaming output.
- Expecting the full response immediately instead of handling streamed tokens.
Always pair streaming=True with a proper callback manager and handler.
python
from langchain.chat_models import ChatOpenAI # Wrong: streaming enabled but no callback manager chat_wrong = ChatOpenAI(streaming=True) response_wrong = chat_wrong.predict("Hello") # This will not stream tokens # Right: streaming with callback manager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.callbacks.manager import CallbackManager callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) chat_right = ChatOpenAI(streaming=True, callback_manager=callback_manager) response_right = chat_right.predict("Hello") # Tokens stream to stdout
Quick Reference
| Parameter | Description | Example |
|---|---|---|
| streaming | Enable token streaming | streaming=True |
| callback_manager | Handles streaming token events | CallbackManager([StreamingStdOutCallbackHandler()]) |
| StreamingStdOutCallbackHandler | Prints tokens as they stream | StreamingStdOutCallbackHandler() |
| predict() | Send prompt and receive streamed tokens | chat.predict("Hello") |
Key Takeaways
Enable streaming by setting streaming=True in your language model.
Use a callback manager with a streaming handler to receive tokens as they generate.
StreamingStdOutCallbackHandler is a simple way to print tokens live to the console.
Without a callback manager, streaming tokens will not be output in real time.
Streaming allows your app to show partial results immediately, improving user experience.