0
0
LangChainframework~30 mins

Metadata preservation during splitting in LangChain - Mini Project: Build & Apply

Choose your learning style9 modes available
Metadata preservation during splitting
📖 Scenario: You are building a document processing tool using LangChain. You have a document with text and metadata. You want to split the document into smaller chunks but keep the metadata attached to each chunk.
🎯 Goal: Create a Python script that uses LangChain's CharacterTextSplitter to split a document while preserving its metadata in each chunk.
📋 What You'll Learn
Create a Document object with specific text and metadata
Create a CharacterTextSplitter with a chunk size of 10
Use the splitter to split the document into chunks
Ensure each chunk keeps the original metadata
💡 Why This Matters
🌍 Real World
When processing large documents for search or analysis, splitting text into smaller parts while keeping metadata helps maintain context and source information.
💼 Career
This skill is useful for developers working on document processing, search engines, chatbots, or any application that handles large text data with metadata.
Progress0 / 4 steps
1
Create the initial Document with text and metadata
Create a Document object named doc with the text 'Hello world! This is a test document.' and metadata {'source': 'test_source'}.
LangChain
Need a hint?

Use Document(page_content=..., metadata=...) to create the document.

2
Create a CharacterTextSplitter with chunk size 10
Create a CharacterTextSplitter object named splitter with chunk_size=10.
LangChain
Need a hint?

Import CharacterTextSplitter from langchain.text_splitter and set chunk_size=10.

3
Split the document into chunks preserving metadata
Use splitter.split_documents with a list containing doc to create a variable chunks.
LangChain
Need a hint?

Call split_documents on a list with doc inside.

4
Verify each chunk preserves the original metadata
Add a for loop iterating over chunks with variable chunk. Inside the loop, assign chunk.metadata to a variable meta.
LangChain
Need a hint?

Use for chunk in chunks: and inside assign meta = chunk.metadata.