LangChainframework~10 mins

Loading web pages with WebBaseLoader in LangChain - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Perf

Concept Flow - Loading web pages with WebBaseLoader

Start

↓

Create WebBaseLoader with URL

↓

Call load() method

↓

Fetch web page content

↓

Parse and store content

↓

Return loaded documents

↓

End

This flow shows how WebBaseLoader takes a URL, fetches the web page content, parses it, and returns it as documents.

Execution Sample

LangChain

from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader('https://example.com')
docs = loader.load()
print(docs[0].page_content[:100])

This code loads the content of 'https://example.com' using WebBaseLoader and prints the first 100 characters.

Execution Table

Step	Action	Input/State	Output/State
1	Create WebBaseLoader	URL='https://example.com'	loader instance created with URL
2	Call load()	loader instance	Starts fetching web page content
3	Fetch content	HTTP GET request to URL	Raw HTML content received
4	Parse content	Raw HTML content	Parsed text extracted from HTML
5	Store content	Parsed text	Document object created with page_content
6	Return documents	Document object	List of documents returned
7	Print first 100 chars	docs[0].page_content	First 100 characters of page content shown

💡 Loading completes after fetching, parsing, and returning the web page content as documents.

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4	After Step 5	Final
loader	None	WebBaseLoader instance with URL	Same instance	Same instance	Same instance	Same instance	Same instance
docs	None	None	None	None	None	List with Document object	List with Document object
page_content	None	None	None	Raw HTML string	Parsed text string	Parsed text string	Parsed text string

Key Moments - 3 Insights

Why do we create a WebBaseLoader instance before calling load()?

What does the load() method return?

Is the content raw HTML or parsed text when returned?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what is the output after Step 3?

AParsed text extracted from HTML

BDocument object created

CRaw HTML content received

DList of documents returned

Concept Snapshot

WebBaseLoader loads web pages by:
1. Creating an instance with a URL
2. Calling load() to fetch the page
3. Parsing HTML to text
4. Returning documents with page content
Use load() to get a list of Document objects containing the page text.

Full Transcript

Loading web pages with WebBaseLoader involves creating a loader instance with the target URL. When load() is called, it fetches the web page content via HTTP, parses the HTML to extract readable text, and stores it in Document objects. Finally, it returns a list of these documents. This process allows easy access to web page text for further processing.