A directory loader helps you quickly load many documents from a folder all at once. It saves time by handling multiple files together instead of one by one.
0
0
Directory loader for bulk documents in LangChain
Introduction
You have a folder full of text files you want to process together.
You want to read many PDFs or documents from a directory for analysis.
You need to prepare a large set of documents for a language model.
You want to automate loading all files in a folder without manual steps.
Syntax
LangChain
from langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('path/to/folder', glob='**/*.txt') documents = loader.load()
The DirectoryLoader takes a folder path and an optional glob pattern to select file types.
The load() method reads all matching files and returns a list of documents.
Examples
This loads all PDF files inside 'data/docs' and its subfolders.
LangChain
from langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('data/docs', glob='**/*.pdf') docs = loader.load()
This loads all Markdown files directly inside the 'notes' folder.
LangChain
loader = DirectoryLoader('notes', glob='*.md') docs = loader.load()
This loads all files in 'articles' folder with default pattern (usually all files).
LangChain
loader = DirectoryLoader('articles')
docs = loader.load()Sample Program
This example loads all text files from the 'my_documents' folder and prints how many were loaded. It also shows a preview of the first document's content.
LangChain
from langchain.document_loaders import DirectoryLoader # Create a loader for all text files in 'my_documents' loader = DirectoryLoader('my_documents', glob='**/*.txt') # Load all documents documents = loader.load() # Print the number of documents loaded print(f"Loaded {len(documents)} documents.") # Print the first 100 characters of the first document if documents: print("First document preview:") print(documents[0].page_content[:100])
OutputSuccess
Important Notes
Make sure the folder path is correct and accessible.
The glob pattern helps filter file types, like '*.txt' or '**/*.pdf'.
Documents are returned as a list of objects with a page_content attribute holding the text.
Summary
DirectoryLoader loads many files from a folder at once.
Use glob to pick specific file types.
Call load() to get all documents as a list.