[Solved] Why does PyPDFLoader return a list of documents instead of a single string when loading a PDF? — Ans: Because it treats each PDF page as a separate document chunk | LangChain

LangChain - Document Loading

Why does PyPDFLoader return a list of documents instead of a single string when loading a PDF?

ABecause PDFs cannot be converted to text

BBecause it treats each PDF page as a separate document chunk

CBecause it loads only the first page by default

DBecause it merges all pages into one document internally

Step-by-Step Solution

Solution:

Step 1: Understand PyPDFLoader's internal design
It splits the PDF into chunks by page, returning a list of document objects.
Step 2: Evaluate other options
PDFs can be converted to text, it loads all pages, and does not merge internally.
Final Answer:
Because it treats each PDF page as a separate document chunk -> Option B
Quick Check:
Page-wise chunking explains list output [OK]

Quick Trick: Each page becomes a document chunk in the list [OK]

Common Mistakes:

Master "Document Loading" in LangChain

9 interactive learning modes - each teaches the same concept differently

More LangChain Quizzes

Why does PyPDFLoader return a list of documents instead of a single string when loading a PDF?