Custom document loaders help you bring in data from places not covered by built-in loaders. They let you read and prepare your own files or sources for your app.
0
0
Custom document loaders in LangChain
Introduction
You have files in a special format that no default loader supports.
You want to load documents from a private database or API.
You need to preprocess or clean data before using it.
You want to combine multiple sources into one loader.
You want to add custom metadata while loading documents.
Syntax
LangChain
from langchain.document_loaders import BaseLoader class MyLoader(BaseLoader): def __init__(self, source_path: str): self.source_path = source_path def load(self): # read and process your data here documents = [] with open(self.source_path, 'r', encoding='utf-8') as f: text = f.read() # create Document objects or dicts documents.append({'page_content': text, 'metadata': {}}) return documents
Custom loaders must inherit from BaseLoader and implement a load method.
The load method returns a list of documents, each with content and optional metadata.
Examples
Loads a plain text file and returns its content as one document.
LangChain
class TxtLoader(BaseLoader): def __init__(self, filepath): self.filepath = filepath def load(self): with open(self.filepath, 'r', encoding='utf-8') as f: text = f.read() return [{'page_content': text, 'metadata': {}}]
Loads JSON data and creates documents from each item with metadata.
LangChain
class JsonLoader(BaseLoader): def __init__(self, filepath): self.filepath = filepath def load(self): import json with open(self.filepath, 'r', encoding='utf-8') as f: data = json.load(f) docs = [] for item in data['items']: docs.append({'page_content': item['text'], 'metadata': {'id': item['id']}}) return docs
Sample Program
This example shows a simple custom loader that reads a text file and adds the file path as metadata. It prints the content and metadata of the loaded document.
LangChain
from langchain.document_loaders import BaseLoader class SimpleTxtLoader(BaseLoader): def __init__(self, filepath): self.filepath = filepath def load(self): with open(self.filepath, 'r', encoding='utf-8') as f: text = f.read() return [{'page_content': text, 'metadata': {'source': self.filepath}}] # Usage example loader = SimpleTxtLoader('example.txt') docs = loader.load() for doc in docs: print(f"Content:\n{doc['page_content']}") print(f"Metadata: {doc['metadata']}")
OutputSuccess
Important Notes
Make sure your custom loader handles file encoding and errors gracefully.
Adding metadata helps track where documents come from and can be useful later.
Test your loader with different inputs to ensure it works as expected.
Summary
Custom document loaders let you bring in data from any source you want.
They must inherit from BaseLoader and implement a load method.
Use them to read, clean, and add metadata to your documents before using them.