LangChain - Document LoadingWhy does PyPDFLoader return a list of documents instead of a single string when loading a PDF?ABecause PDFs cannot be converted to textBBecause it treats each PDF page as a separate document chunkCBecause it loads only the first page by defaultDBecause it merges all pages into one document internallyCheck Answer
Step-by-Step SolutionSolution:Step 1: Understand PyPDFLoader's internal designIt splits the PDF into chunks by page, returning a list of document objects.Step 2: Evaluate other optionsPDFs can be converted to text, it loads all pages, and does not merge internally.Final Answer:Because it treats each PDF page as a separate document chunk -> Option BQuick Check:Page-wise chunking explains list output [OK]Quick Trick: Each page becomes a document chunk in the list [OK]Common Mistakes:Thinking PDFs can't be converted to textAssuming only first page loadsBelieving PyPDFLoader merges pages internally
Master "Document Loading" in LangChain9 interactive learning modes - each teaches the same concept differentlyLearnWhyDeepVisualTryChallengeProjectRecallPerf
More LangChain Quizzes Document Loading - Directory loader for bulk documents - Quiz 6medium Document Loading - Loading CSV and Excel files - Quiz 7medium Embeddings and Vector Stores - Chroma vector store setup - Quiz 12easy Embeddings and Vector Stores - Metadata filtering in vector stores - Quiz 3easy RAG Chain Construction - Contextual compression - Quiz 1easy RAG Chain Construction - Why the RAG chain connects retrieval to generation - Quiz 11easy RAG Chain Construction - Multi-query retrieval for better recall - Quiz 13medium RAG Chain Construction - Hybrid search (keyword + semantic) - Quiz 9hard Text Splitting - Token-based splitting - Quiz 6medium Text Splitting - Overlap and chunk boundaries - Quiz 9hard