0
0
LangChainframework~10 mins

Loading web pages with WebBaseLoader in LangChain - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Loading web pages with WebBaseLoader
Start
Create WebBaseLoader with URL
Call load() method
Fetch web page content
Parse and store content
Return loaded documents
End
This flow shows how WebBaseLoader takes a URL, fetches the web page content, parses it, and returns it as documents.
Execution Sample
LangChain
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader('https://example.com')
docs = loader.load()
print(docs[0].page_content[:100])
This code loads the content of 'https://example.com' using WebBaseLoader and prints the first 100 characters.
Execution Table
StepActionInput/StateOutput/State
1Create WebBaseLoaderURL='https://example.com'loader instance created with URL
2Call load()loader instanceStarts fetching web page content
3Fetch contentHTTP GET request to URLRaw HTML content received
4Parse contentRaw HTML contentParsed text extracted from HTML
5Store contentParsed textDocument object created with page_content
6Return documentsDocument objectList of documents returned
7Print first 100 charsdocs[0].page_contentFirst 100 characters of page content shown
💡 Loading completes after fetching, parsing, and returning the web page content as documents.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5Final
loaderNoneWebBaseLoader instance with URLSame instanceSame instanceSame instanceSame instanceSame instance
docsNoneNoneNoneNoneNoneList with Document objectList with Document object
page_contentNoneNoneNoneRaw HTML stringParsed text stringParsed text stringParsed text string
Key Moments - 3 Insights
Why do we create a WebBaseLoader instance before calling load()?
Because the loader needs to know which URL to fetch. Step 1 shows creating the loader with the URL, so load() knows what to load.
What does the load() method return?
It returns a list of Document objects containing the parsed page content, as shown in Step 6 and Step 7.
Is the content raw HTML or parsed text when returned?
The content is parsed text extracted from the HTML, not raw HTML, as shown in Step 4 and Step 5.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the output after Step 3?
AParsed text extracted from HTML
BDocument object created
CRaw HTML content received
DList of documents returned
💡 Hint
Check the 'Output/State' column for Step 3 in the execution_table.
At which step does the loader return the list of documents?
AStep 6
BStep 5
CStep 4
DStep 7
💡 Hint
Look for the step where 'Return documents' happens in the execution_table.
If the URL changes, which variable in variable_tracker changes after Step 1?
Adocs
Bloader
Cpage_content
DNone
💡 Hint
Step 1 shows creating the loader with the URL, affecting the loader variable.
Concept Snapshot
WebBaseLoader loads web pages by:
1. Creating an instance with a URL
2. Calling load() to fetch the page
3. Parsing HTML to text
4. Returning documents with page content
Use load() to get a list of Document objects containing the page text.
Full Transcript
Loading web pages with WebBaseLoader involves creating a loader instance with the target URL. When load() is called, it fetches the web page content via HTTP, parses the HTML to extract readable text, and stores it in Document objects. Finally, it returns a list of these documents. This process allows easy access to web page text for further processing.