How to load documents langchain

LangchainHow-ToBeginner · 3 min read

How to Load Documents in Langchain: Simple Guide

To load documents in Langchain, use one of its built-in document loaders like TextLoader or UnstructuredPDFLoader. These loaders read files and convert them into Document objects that Langchain can process.

📐

Syntax

Langchain provides various document loaders to read different file types. The general syntax is:

Loader = SomeLoader(file_path): Create a loader instance with the file path.
documents = Loader.load(): Load and return a list of Document objects.

Each loader is designed for specific file formats like text, PDF, or HTML.

python

from langchain.document_loaders import TextLoader

loader = TextLoader("example.txt")
documents = loader.load()

💻

Example

This example shows how to load a plain text file using TextLoader and print the content of the first document.

python

from langchain.document_loaders import TextLoader

# Create a loader for a text file
loader = TextLoader("sample.txt")

# Load documents
documents = loader.load()

# Print the content of the first document
print(documents[0].page_content)

Output

This is the content of the sample.txt file.

⚠️

Common Pitfalls

Common mistakes when loading documents in Langchain include:

Using the wrong loader for the file type (e.g., using TextLoader for PDFs).
Not providing the correct file path or missing files.
Forgetting to call load() method to actually read the documents.

Always check the file format and use the matching loader.

python

from langchain.document_loaders import TextLoader, UnstructuredPDFLoader

# Wrong way: Using TextLoader for a PDF file
loader_wrong = TextLoader("document.pdf")
documents_wrong = loader_wrong.load()  # This will not load PDF correctly

# Right way: Use UnstructuredPDFLoader for PDFs
loader_right = UnstructuredPDFLoader("document.pdf")
documents_right = loader_right.load()

📊

Quick Reference

Here is a quick reference for common Langchain document loaders:

Loader	File Type	Description
TextLoader	Plain text (.txt)	Loads plain text files as documents.
UnstructuredPDFLoader	PDF (.pdf)	Loads PDF files using unstructured data parsing.
CSVLoader	CSV (.csv)	Loads CSV files, parsing rows as documents.
HTMLLoader	HTML (.html)	Loads HTML files, extracting text content.

✅

Key Takeaways

Use the correct Langchain loader for your document file type to ensure proper loading.

Always call the load() method on the loader instance to get Document objects.

Check file paths carefully to avoid file not found errors.

Langchain supports many loaders like TextLoader for text and UnstructuredPDFLoader for PDFs.

Loaded documents are returned as a list of Document objects with accessible content.