0
0
LangChainframework~5 mins

Streaming in production in LangChain

Choose your learning style9 modes available
Introduction

Streaming lets your app show results bit by bit as they come. This makes users feel the app is faster and more responsive.

When you want to display partial answers from a language model as soon as they are ready.
When handling long responses that take time to generate, so users see progress.
When building chatbots or assistants that reply in real-time.
When you want to reduce waiting time and improve user experience in production apps.
Syntax
LangChain
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
response = llm("Tell me a story.")

Set streaming=True to enable streaming mode.

Pass a list of callbacks to handle streamed tokens as they arrive.

Examples
This example streams the poem tokens to the console as they arrive.
LangChain
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
response = llm("Write a poem about spring.")
Custom callback class to handle each new token received from the model.
LangChain
from langchain.llms import OpenAI

class MyStreamHandler:
    def on_llm_new_token(self, token: str):
        print(f"Received token: {token}")

handler = MyStreamHandler()
llm = OpenAI(streaming=True, callbacks=[handler])
llm("Explain photosynthesis.")
Sample Program

This program streams the answer token by token to the console, then prints the full answer.

LangChain
from langchain.llms import OpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Enable streaming to print tokens as they come
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])

# Ask the model a question
response = llm("What is the capital of France?")

print(f"\nFull response: {response}")
OutputSuccess
Important Notes

Streaming requires your model provider to support token streaming.

Callbacks let you customize what happens with each token (e.g., update UI, log, or process).

Streaming improves user experience by showing partial results early.

Summary

Streaming sends tokens as soon as they are generated.

Use streaming=True and callbacks to enable streaming in LangChain.

Streaming is great for chatbots and apps needing fast, live feedback.