0
0
LangChainframework~8 mins

Loading web pages with WebBaseLoader in LangChain - Performance & Optimization

Choose your learning style9 modes available
Performance: Loading web pages with WebBaseLoader
MEDIUM IMPACT
This affects the initial data loading speed and memory usage when fetching web pages for processing.
Loading multiple web pages for processing in LangChain
LangChain
from langchain.document_loaders import WebBaseLoader
import asyncio

async def load_page(url):
    loader = WebBaseLoader(url)
    return await loader.aload()

urls = ["https://example.com/page1", "https://example.com/page2"]
results = asyncio.run(asyncio.gather(*(load_page(url) for url in urls)))
docs = [doc for batch in results for doc in batch]
Loads pages concurrently, reducing total wait time and smoothing memory usage.
📈 Performance GainReduces total load time close to max single page load; lowers memory spikes by parallelism.
Loading multiple web pages for processing in LangChain
LangChain
from langchain.document_loaders import WebBaseLoader
urls = ["https://example.com/page1", "https://example.com/page2"]
docs = []
for url in urls:
    loader = WebBaseLoader(url)
    docs.extend(loader.load())
Sequential loading blocks the process for each page, increasing total load time and memory spikes.
📉 Performance CostBlocks loading for N times the single page load duration; high memory spikes due to no streaming.
Performance Comparison
PatternDOM OperationsReflowsPaint CostVerdict
Sequential WebBaseLoader callsN/A (no DOM in backend)N/AN/A[X] Bad
Concurrent WebBaseLoader calls with asyncN/AN/AN/A[OK] Good
Rendering Pipeline
WebBaseLoader fetches HTML content from URLs, which then passes to parsing and tokenization stages before use in LangChain.
Network Fetch
Parsing
Memory Allocation
⚠️ BottleneckNetwork Fetch is the slowest stage due to HTTP request latency.
Optimization Tips
1Load multiple web pages concurrently to reduce total wait time.
2Limit the size of pages fetched to reduce memory usage.
3Avoid blocking operations during page loading to improve responsiveness.
Performance Quiz - 3 Questions
Test your performance knowledge
What is the main performance benefit of loading web pages concurrently with WebBaseLoader?
AReduces total loading time by fetching pages in parallel
BDecreases the size of each web page fetched
CImproves the visual rendering speed of the page
DEliminates the need for network requests
DevTools: cProfile / time.perf_counter
How to check: Profile with `python -m cProfile your_script.py` or add `import time; start = time.perf_counter(); ...; print(time.perf_counter() - start)` around loading code.
What to look for: Reduced total time in concurrent pattern; overlapping request durations indicating parallelism.