Consider this LangChain code snippet that uses streaming to get partial outputs:
from langchain.llms import OpenAI
llm = OpenAI(streaming=True)
for token in llm.stream("Hello, how are you?"):
print(token, end='')What will this code do when run?
from langchain.llms import OpenAI llm = OpenAI(streaming=True) for token in llm.stream("Hello, how are you?"): print(token, end='')
Streaming mode allows receiving tokens as they are generated.
When streaming=True, the stream method yields tokens as they are generated, so the code prints tokens one by one in real time.
You want to create an OpenAI LLM instance that streams output tokens. Which code snippet correctly enables streaming?
Check the official LangChain parameter name for streaming.
The correct parameter to enable streaming is streaming=True. Other options are invalid and cause errors.
Given this code snippet:
collected = ""
for token in llm.stream("Say hello"):
collected += token
print(collected)What will be printed?
collected = "" for token in llm.stream("Say hello"): collected += token print(collected)
Tokens are concatenated in the loop into a string variable.
Each token is added to the string collected, so after the loop, collected contains the full response.
Look at this code:
llm = OpenAI(streaming=True)
response = llm("Hello")
for token in response:
print(token)Why does it raise an error?
llm = OpenAI(streaming=True) response = llm("Hello") for token in response: print(token)
Check how streaming tokens are accessed in LangChain.
When streaming=True, you must call llm.stream() to get an iterable of tokens. Calling llm() returns the full string, not a stream.
Why would a developer choose to use streaming responses when calling an LLM in LangChain?
Think about user experience and response speed.
Streaming allows tokens to be received and displayed immediately as they are generated, improving responsiveness and user experience.