How to Use Document Loader in Langchain: Simple Guide
In Langchain, use a
DocumentLoader to load documents from various sources like files or URLs into a standard format. You create an instance of a loader class (e.g., TextLoader), call its load() method, and get a list of Document objects ready for processing.Syntax
The basic syntax involves importing a document loader class, creating an instance with the source path or URL, and calling load() to get documents.
DocumentLoader: Base class for loaders.TextLoader: Loads plain text files.load(): Method to read and return documents as a list.
python
from langchain.document_loaders import TextLoader loader = TextLoader("path/to/file.txt") documents = loader.load()
Example
This example shows how to load a simple text file using TextLoader and print the content of the first document.
python
from langchain.document_loaders import TextLoader # Create a loader for a text file loader = TextLoader("example.txt") # Load documents (list of Document objects) documents = loader.load() # Print the content of the first document print(documents[0].page_content)
Output
This is the content of example.txt file.
Common Pitfalls
- Not providing the correct file path or URL causes file not found errors.
- Using a loader incompatible with the file type (e.g., TextLoader for PDFs) will fail.
- For large files, loading all at once may cause memory issues; consider chunking.
- Always check the returned list; it may be empty if the file is empty or unreadable.
python
from langchain.document_loaders import TextLoader # Wrong: Using TextLoader for a PDF file loader = TextLoader("document.pdf") # This will not work properly # Right: Use a PDF loader for PDFs from langchain.document_loaders import PyPDFLoader loader = PyPDFLoader("document.pdf") documents = loader.load()
Quick Reference
Here is a quick summary of common document loaders in Langchain:
| Loader | Use Case | Input Type |
|---|---|---|
| TextLoader | Load plain text files | File path (.txt) |
| PyPDFLoader | Load PDF documents | File path (.pdf) |
| UnstructuredURLLoader | Load documents from URLs | Web URL |
| CSVLoader | Load CSV files | File path (.csv) |
Key Takeaways
Use the appropriate document loader class for your file type to avoid errors.
Call the load() method on the loader instance to get a list of Document objects.
Check file paths and URLs carefully to ensure the loader can access the source.
For large documents, consider loaders that support chunking or streaming.
Langchain provides many specialized loaders for different document formats.