0
0
LangChainframework~10 mins

Metadata preservation during splitting in LangChain - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Metadata preservation during splitting
Input Document with Metadata
Split Document into Chunks
Copy Metadata to Each Chunk
Output: List of Chunks with Metadata
The document is split into smaller parts, and metadata is copied to each part to keep context.
Execution Sample
LangChain
doc = Document(page_content='Hello world', metadata={'source': 'file1'})
splitter = CharacterTextSplitter(chunk_size=5)
chunks = splitter.split_documents([doc])
Splits a document into chunks while keeping the original metadata on each chunk.
Execution Table
StepActionInput DocumentChunks CreatedMetadata on Chunks
1Start with one document{content: 'Hello world', metadata: {'source': 'file1'}}0N/A
2Split content into chunks of size 5Content split into ['Hello', ' world']2N/A
3Assign metadata to each chunkEach chunk gets metadata {'source': 'file1'}2All chunks have {'source': 'file1'}
4Output chunks with metadataChunks ready for use2Metadata preserved on all chunks
💡 All chunks created and metadata copied, splitting complete
Variable Tracker
VariableStartAfter SplitFinal
doc{content: 'Hello world', metadata: {'source': 'file1'}}SameSame
chunks[][{content: 'Hello'}, {content: ' world'}][{content: 'Hello', metadata: {'source': 'file1'}}, {content: ' world', metadata: {'source': 'file1'}}]
Key Moments - 2 Insights
Why does each chunk need its own copy of metadata?
Because after splitting, each chunk is treated as a separate document. The metadata must be copied to keep the original context, as shown in step 3 of the execution_table.
Does splitting change the original document's metadata?
No, the original document's metadata stays the same. The metadata is copied to new chunk objects, not removed or altered from the original, as seen in variable_tracker.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2. How many chunks are created after splitting?
A1
B2
C3
D0
💡 Hint
Check the 'Chunks Created' column at step 2 in the execution_table.
According to variable_tracker, what is the metadata of the final chunks?
AEmpty dictionary {}
BNo metadata
C{'source': 'file1'}
DMetadata removed
💡 Hint
Look at the 'chunks' row under the 'Final' column in variable_tracker.
If the chunk size was increased to 11, how would the number of chunks change?
AFewer chunks than before
BMore chunks than before
CSame number of chunks
DNo chunks created
💡 Hint
Refer to how chunk_size affects splitting in the execution_sample code.
Concept Snapshot
Metadata preservation during splitting:
- Input document has content + metadata
- Split content into smaller chunks
- Copy metadata to each chunk
- Output chunks keep original metadata
- Keeps context for each piece after splitting
Full Transcript
This visual execution shows how a document with metadata is split into smaller chunks while keeping the metadata on each chunk. First, the document content is split into parts based on chunk size. Then, the metadata from the original document is copied to each chunk so that context is preserved. The execution table traces each step from starting with one document, splitting content, assigning metadata, to outputting chunks with metadata. The variable tracker shows how the chunks array changes from empty to holding chunks with metadata. Key moments clarify why metadata copying is necessary and that the original document's metadata remains unchanged. The quiz tests understanding of chunk count, metadata presence, and chunk size effects. This helps beginners see how metadata preservation works during document splitting in Langchain.