Streaming lets your app show results bit by bit as they come. This makes users feel the app is faster and more responsive.
Streaming in production in LangChain
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Syntax
LangChain
from langchain.llms import OpenAI from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()]) response = llm("Tell me a story.")
Set streaming=True to enable streaming mode.
Pass a list of callbacks to handle streamed tokens as they arrive.
Examples
LangChain
from langchain.llms import OpenAI from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()]) response = llm("Write a poem about spring.")
LangChain
from langchain.llms import OpenAI class MyStreamHandler: def on_llm_new_token(self, token: str): print(f"Received token: {token}") handler = MyStreamHandler() llm = OpenAI(streaming=True, callbacks=[handler]) llm("Explain photosynthesis.")
Sample Program
This program streams the answer token by token to the console, then prints the full answer.
LangChain
from langchain.llms import OpenAI from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler # Enable streaming to print tokens as they come llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()]) # Ask the model a question response = llm("What is the capital of France?") print(f"\nFull response: {response}")
Important Notes
Streaming requires your model provider to support token streaming.
Callbacks let you customize what happens with each token (e.g., update UI, log, or process).
Streaming improves user experience by showing partial results early.
Summary
Streaming sends tokens as soon as they are generated.
Use streaming=True and callbacks to enable streaming in LangChain.
Streaming is great for chatbots and apps needing fast, live feedback.
Practice
1. What does enabling
streaming=True in LangChain do?easy
Solution
Step 1: Understand streaming behavior in LangChain
Streaming means tokens are sent one by one as soon as they are generated, not waiting for the full response.Step 2: Match streaming=True effect
Setting streaming=True activates this immediate token sending behavior.Final Answer:
It sends tokens immediately as they are generated. -> Option AQuick Check:
Streaming = immediate token sending [OK]
Hint: Streaming means tokens flow out live, not delayed [OK]
Common Mistakes:
- Thinking streaming buffers all tokens first
- Confusing streaming with disabling callbacks
- Assuming streaming delays output
2. Which of the following is the correct way to enable streaming with callbacks in LangChain?
easy
Solution
Step 1: Recall correct parameter names
LangChain's OpenAI class uses 'streaming=True' and 'callbacks' as a list of handlers.Step 2: Check each option's syntax
llm = OpenAI(streaming=True, callbacks=[MyCallbackHandler()]) correctly uses streaming=True and callbacks as a list. Others misuse parameter names or types.Final Answer:
llm = OpenAI(streaming=True, callbacks=[MyCallbackHandler()]) -> Option AQuick Check:
Correct params: streaming=True, callbacks=[handler] [OK]
Hint: Use streaming=True and callbacks as a list [OK]
Common Mistakes:
- Using streaming=False to try enabling streaming
- Passing callbacks as a single object, not a list
- Misspelling parameter names like 'stream' or 'callback'
3. Given this code snippet:
What will be the output behavior?
from langchain.callbacks.base import BaseCallbackHandler
class PrintTokens(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs):
print(token, end='')
llm = OpenAI(streaming=True, callbacks=[PrintTokens()])
llm('Hello world')What will be the output behavior?
medium
Solution
Step 1: Understand the callback handler
The PrintTokens class prints each token immediately when on_llm_new_token is called.Step 2: Streaming enabled triggers token callbacks live
With streaming=True, tokens are sent and printed one by one as generated.Final Answer:
Prints each token of 'Hello world' immediately as it is generated. -> Option DQuick Check:
Streaming + on_llm_new_token = live token print [OK]
Hint: Streaming with on_llm_new_token prints tokens live [OK]
Common Mistakes:
- Expecting full output after completion
- Assuming callbacks don't work with streaming
- Missing that print uses end='' to avoid newlines
4. What is the main issue with this code snippet for streaming in LangChain?
llm = OpenAI(streaming=True, callbacks=PrintTokens())
llm('Test')medium
Solution
Step 1: Check callback parameter type
LangChain expects callbacks as a list, even if only one handler is used.Step 2: Identify error cause
Passing callbacks=PrintTokens() (not in a list) causes a type error or unexpected behavior.Final Answer:
Callbacks must be passed as a list, not a single instance. -> Option CQuick Check:
Callbacks = list of handlers [OK]
Hint: Always wrap callbacks in a list, even if one [OK]
Common Mistakes:
- Passing a single callback object directly
- Assuming streaming=True is invalid
- Forgetting to implement callback methods
5. You want to build a chatbot that shows user responses token-by-token as they are generated. Which combination of LangChain features should you use in production?
hard
Solution
Step 1: Identify streaming usage for live token display
Streaming must be enabled to get tokens as they generate, not after full response.Step 2: Use callback handler to process tokens live
Implementing on_llm_new_token in a callback lets you display tokens immediately.Step 3: Confirm best practice for production chatbot
Combining streaming=True with a callback that prints tokens live is the correct approach.Final Answer:
Use streaming=True with a callback handler implementing on_llm_new_token to display tokens live. -> Option BQuick Check:
Streaming + on_llm_new_token = live chatbot tokens [OK]
Hint: Streaming plus on_llm_new_token callback shows tokens live [OK]
Common Mistakes:
- Disabling streaming and expecting live tokens
- Not using callbacks to handle tokens
- Printing tokens only after full response
