How to use document loader langchain

LangchainHow-ToBeginner · 4 min read

How to Use Document Loader in Langchain: Simple Guide

In Langchain, use a DocumentLoader to load documents from various sources like files or URLs into a standard format. You create an instance of a loader class (e.g., TextLoader), call its load() method, and get a list of Document objects ready for processing.

📐

Syntax

The basic syntax involves importing a document loader class, creating an instance with the source path or URL, and calling load() to get documents.

DocumentLoader: Base class for loaders.
TextLoader: Loads plain text files.
load(): Method to read and return documents as a list.

python

from langchain.document_loaders import TextLoader

loader = TextLoader("path/to/file.txt")
documents = loader.load()

💻

Example

This example shows how to load a simple text file using TextLoader and print the content of the first document.

python

from langchain.document_loaders import TextLoader

# Create a loader for a text file
loader = TextLoader("example.txt")

# Load documents (list of Document objects)
documents = loader.load()

# Print the content of the first document
print(documents[0].page_content)

Output

This is the content of example.txt file.

⚠️

Common Pitfalls

Not providing the correct file path or URL causes file not found errors.
Using a loader incompatible with the file type (e.g., TextLoader for PDFs) will fail.
For large files, loading all at once may cause memory issues; consider chunking.
Always check the returned list; it may be empty if the file is empty or unreadable.

python

from langchain.document_loaders import TextLoader

# Wrong: Using TextLoader for a PDF file
loader = TextLoader("document.pdf")  # This will not work properly

# Right: Use a PDF loader for PDFs
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("document.pdf")
documents = loader.load()

📊

Quick Reference

Here is a quick summary of common document loaders in Langchain:

Loader	Use Case	Input Type
TextLoader	Load plain text files	File path (.txt)
PyPDFLoader	Load PDF documents	File path (.pdf)
UnstructuredURLLoader	Load documents from URLs	Web URL
CSVLoader	Load CSV files	File path (.csv)

✅

Key Takeaways

Use the appropriate document loader class for your file type to avoid errors.

Call the load() method on the loader instance to get a list of Document objects.

Check file paths and URLs carefully to ensure the loader can access the source.

For large documents, consider loaders that support chunking or streaming.

Langchain provides many specialized loaders for different document formats.