What is Directory loader for bulk documents in LangChain?

LangChainframework~5 mins

Directory loader for bulk documents in LangChain

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Perf

Introduction

A directory loader helps you quickly load many documents from a folder all at once. It saves time by handling multiple files together instead of one by one.

You have a folder full of text files you want to process together.

You want to read many PDFs or documents from a directory for analysis.

You need to prepare a large set of documents for a language model.

You want to automate loading all files in a folder without manual steps.

Syntax

LangChain

from langchain.document_loaders import DirectoryLoader

loader = DirectoryLoader('path/to/folder', glob='**/*.txt')
documents = loader.load()

The DirectoryLoader takes a folder path and an optional glob pattern to select file types.

The load() method reads all matching files and returns a list of documents.

Examples

This loads all PDF files inside 'data/docs' and its subfolders.

LangChain

from langchain.document_loaders import DirectoryLoader

loader = DirectoryLoader('data/docs', glob='**/*.pdf')
docs = loader.load()

This loads all Markdown files directly inside the 'notes' folder.

LangChain

loader = DirectoryLoader('notes', glob='*.md')
docs = loader.load()

This loads all files in 'articles' folder with default pattern (usually all files).

LangChain

loader = DirectoryLoader('articles')
docs = loader.load()

Sample Program

This example loads all text files from the 'my_documents' folder and prints how many were loaded. It also shows a preview of the first document's content.

LangChain

from langchain.document_loaders import DirectoryLoader

# Create a loader for all text files in 'my_documents'
loader = DirectoryLoader('my_documents', glob='**/*.txt')

# Load all documents
documents = loader.load()

# Print the number of documents loaded
print(f"Loaded {len(documents)} documents.")

# Print the first 100 characters of the first document
if documents:
    print("First document preview:")
    print(documents[0].page_content[:100])

OutputSuccess

Important Notes

Make sure the folder path is correct and accessible.

The glob pattern helps filter file types, like '*.txt' or '**/*.pdf'.

Documents are returned as a list of objects with a page_content attribute holding the text.

Summary

DirectoryLoader loads many files from a folder at once.

Use glob to pick specific file types.

Call load() to get all documents as a list.