0
0
LangChainframework~5 mins

Streaming responses in LangChain

Choose your learning style9 modes available
Introduction

Streaming responses let your app show answers bit by bit as they come in. This makes waiting feel shorter and keeps users engaged.

When you want to display AI-generated text as it is created, like in chat apps.
When responses take time and you want to show progress to users.
When you want to reduce perceived delay in interactive applications.
When handling large outputs that arrive in parts.
When building real-time assistants or conversational bots.
Syntax
LangChain
from langchain.llms import OpenAI

llm = OpenAI(streaming=True)

for token in llm.stream("Hello, how are you?"):
    print(token, end='')

Set streaming=True when creating the LLM instance to enable streaming.

Use the stream() method to get tokens as they arrive.

Examples
This prints the joke token by token as the AI generates it.
LangChain
llm = OpenAI(streaming=True)
for token in llm.stream("Tell me a joke."):
    print(token, end='')
This collects all tokens into a string and prints the full poem at the end.
LangChain
llm = OpenAI(streaming=True)
response = ""
for token in llm.stream("Write a poem."):
    response += token
print(response)
Sample Program

This program asks the AI about the weather and prints each word or piece as it comes in. The user sees the answer build up live.

LangChain
from langchain.llms import OpenAI

# Create an OpenAI LLM with streaming enabled
llm = OpenAI(streaming=True)

# Stream tokens from the prompt and print them as they arrive
print("AI says:")
for token in llm.stream("What is the weather like today?"):
    print(token, end='', flush=True)
print()
OutputSuccess
Important Notes

Streaming requires your LLM provider to support it; check your API docs.

Use flush=True in print to show tokens immediately in the console.

Streaming helps user experience but can be more complex to handle than full responses.

Summary

Streaming responses show AI output bit by bit as it is generated.

Enable streaming by setting streaming=True when creating the LLM.

Use streaming to improve user experience in chat and interactive apps.