0
0
LangChainframework~5 mins

Loading web pages with WebBaseLoader in LangChain

Choose your learning style9 modes available
Introduction

WebBaseLoader helps you get text content from web pages easily. It saves time by loading and preparing web page text for you.

You want to read and analyze the text from a website in your program.
You need to collect information from multiple web pages automatically.
You want to prepare web page content for a language model or text processing.
You are building a tool that summarizes or searches web page content.
You want to avoid manual copy-pasting of web page text.
Syntax
LangChain
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader(url)
documents = loader.load()
Replace url with the web page address you want to load.
The load() method fetches and returns the page content as documents.
Examples
Loads the text content from https://example.com into docs.
LangChain
from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader('https://example.com')
docs = loader.load()
Loads the Wikipedia page about Python programming language.
LangChain
loader = WebBaseLoader('https://en.wikipedia.org/wiki/Python_(programming_language)')
content = loader.load()
Sample Program

This program loads the content from https://example.com and prints the first 300 characters of the text. It shows how to use WebBaseLoader to get web page text easily.

LangChain
from langchain.document_loaders import WebBaseLoader

# Create a loader for a simple web page
loader = WebBaseLoader('https://example.com')

# Load the documents (web page content)
documents = loader.load()

# Print the first 300 characters of the page text
print(documents[0].page_content[:300])
OutputSuccess
Important Notes

WebBaseLoader fetches the raw HTML and extracts text content.

Some web pages may block automated loading or require special headers.

Always respect website terms and robots.txt when loading pages.

Summary

WebBaseLoader helps you get text from web pages easily.

Use it when you want to process or analyze web content in your code.

Just give it a URL and call load() to get the page text.