LangChainframework~30 mins

Streaming responses in LangChain - Mini Project: Build & Apply

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Perf

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Streaming Responses with LangChain

📖 Scenario: You are building a chatbot that replies to user questions in real-time. Instead of waiting for the full answer, the chatbot streams the response piece by piece, just like a friend typing back to you live.

🎯 Goal: Create a LangChain setup that streams responses from a language model. You will first set up the data, then configure streaming, implement the streaming logic, and finally complete the streaming output.

📋 What You'll Learn

Create a LangChain ChatOpenAI language model instance with streaming enabled

Set up a simple prompt template for the chatbot

Use a CallbackHandler to capture streamed tokens

Print each token as it streams to simulate live typing

💡 Why This Matters

🌍 Real World

Streaming responses are used in chatbots and assistants to provide faster, more interactive user experiences by showing answers as they are generated.

💼 Career

Understanding streaming in language models is important for building responsive AI applications in customer support, education, and interactive tools.

Progress0 / 4 steps

Set up the LangChain ChatOpenAI model with streaming enabled

Create a variable called llm that is an instance of ChatOpenAI with streaming=True and temperature=0.

LangChain

# Your code here to create the OpenAI instance with streaming enabled

Hint

Use ChatOpenAI(streaming=True, temperature=0) to create the model instance.

Create a prompt template for the chatbot

Create a variable called prompt that is a PromptTemplate with the input variable question and template string 'Answer this: {question}'.

LangChain

from langchain.prompts import PromptTemplate

# Your code here to create the prompt template

Hint

Use PromptTemplate(input_variables=["question"], template="Answer this: {question}").

Implement the streaming callback handler

Create a class called StreamingHandler that inherits from BaseCallbackHandler. Implement the method on_llm_new_token(self, token: str, **kwargs) that prints each token without a newline.

LangChain

from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

llm = ChatOpenAI(streaming=True, temperature=0)
prompt = PromptTemplate(input_variables=["question"], template="Answer this: {question}")

from langchain.callbacks.base import BaseCallbackHandler

# Define StreamingHandler class here
# Your code here

Hint

Override on_llm_new_token to print tokens as they arrive.

Run the streaming LLM with the callback handler

Create an instance of StreamingHandler called handler. Then call llm with prompt.format(question='What is AI?') and callbacks=[handler] to stream the answer.

LangChain

from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

llm = ChatOpenAI(streaming=True, temperature=0)
prompt = PromptTemplate(input_variables=["question"], template="Answer this: {question}")

from langchain.callbacks.base import BaseCallbackHandler

class StreamingHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs):
        print(token, end='', flush=True)

# Create handler instance and call llm with streaming
# Your code here

Hint

Create handler = StreamingHandler() and call llm(prompt.format(question='What is AI?'), callbacks=[handler]).

Practice

(1/5)

1. What does enabling streaming=True do in a LangChain LLM?

easy

A. It disables the AI's output completely.

B. It shows the AI's output bit by bit as it is generated.

C. It caches the AI's output for later use.

D. It speeds up the AI's training process.

Streaming responses in LangChain - Mini Project: Build & Apply

Start learning this pattern below

Practice

Solution

Step 1: Understand streaming in LangChain

Step 2: Effect of setting streaming=True

Final Answer:

Quick Check:

Solution

Step 1: Recall LangChain LLM streaming parameter

Step 2: Match correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand streaming=True behavior in plain invoke

Step 2: What print(response) shows

Final Answer:

Quick Check:

Solution

Step 1: Identify missing streaming parameter

Step 2: Enable streaming properly

Final Answer:

Quick Check:

Solution

Step 1: Understand streaming for chat apps

Step 2: Use callbacks to handle partial tokens

Step 3: Why other options fail

Final Answer:

Quick Check: