0
0
LangChainframework~20 mins

Metadata preservation during splitting in LangChain - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Metadata Mastery in LangChain
Get all challenges correct to earn this badge!
Test your skills under time pressure!
component_behavior
intermediate
2:00remaining
What happens to metadata after splitting?
Given a document with metadata, what will be the metadata of the first chunk after splitting using a typical LangChain text splitter?
LangChain
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

doc = Document(page_content="Hello world! This is a test.", metadata={"source": "test_source", "page": 1})
splitter = RecursiveCharacterTextSplitter(chunk_size=12, chunk_overlap=0)
chunks = splitter.split_documents([doc])
first_chunk_metadata = chunks[0].metadata
A{"source": "test_source", "page": 1}
B{}
C{"source": "test_source"}
DNone
Attempts:
2 left
💡 Hint
Think about whether the splitter copies metadata or discards it.
state_output
intermediate
2:00remaining
How many chunks retain metadata after splitting?
If you split a single document with metadata into 3 chunks using LangChain's RecursiveCharacterTextSplitter, how many chunks will have the original metadata preserved?
LangChain
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

doc = Document(page_content="A long text that will be split into three parts.", metadata={"author": "Alice"})
splitter = RecursiveCharacterTextSplitter(chunk_size=20, chunk_overlap=0)
chunks = splitter.split_documents([doc])
count_with_metadata = sum(1 for c in chunks if c.metadata == {"author": "Alice"})
A1
B3
C0
DDepends on chunk content
Attempts:
2 left
💡 Hint
Consider if metadata is copied or removed during splitting.
📝 Syntax
advanced
2:30remaining
Which code correctly preserves metadata during splitting?
Which of the following code snippets correctly preserves the metadata of documents after splitting using LangChain's RecursiveCharacterTextSplitter?
Achunks = splitter.split_documents([doc])
Bchunks = splitter.split_text(doc.page_content)
Cchunks = [Document(page_content=chunk) for chunk in splitter.split_text(doc.page_content)]
Dchunks = [Document(page_content=chunk, metadata=doc.metadata) for chunk in splitter.split_text(doc.page_content)]
Attempts:
2 left
💡 Hint
Look for the method that returns Document objects with metadata.
🔧 Debug
advanced
2:00remaining
Why is metadata lost after splitting?
A developer uses splitter.split_text(doc.page_content) and then creates new Document objects from the chunks but forgets to assign metadata. What is the result for metadata in the chunks?
LangChain
chunks_text = splitter.split_text(doc.page_content)
chunks = [Document(page_content=chunk) for chunk in chunks_text]
metadata_list = [chunk.metadata for chunk in chunks]
ANone # metadata attribute missing
B[{"source": "original"}, ...] # original metadata preserved
C[{}, {}, ...] # empty metadata dictionaries
DError: Document constructor requires metadata
Attempts:
2 left
💡 Hint
Check what happens if metadata is not passed when creating Document objects.
🧠 Conceptual
expert
3:00remaining
How to ensure custom metadata updates during splitting?
You want to split documents but also add a new metadata field 'chunk_index' to each chunk indicating its order. Which approach correctly achieves this while preserving original metadata?
LangChain
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter

doc = Document(page_content="Some long text here.", metadata={"source": "book"})
splitter = RecursiveCharacterTextSplitter(chunk_size=10, chunk_overlap=0)
chunks = splitter.split_documents([doc])
# Add chunk_index metadata here
Achunks = splitter.split_text(doc.page_content); for i, text in enumerate(chunks): chunks[i] = Document(page_content=text, metadata={'chunk_index': i})
Bfor i, c in enumerate(chunks): c.metadata['chunk_index'] = i
Cchunks = [Document(page_content=c.page_content, metadata={'chunk_index': i}) for i, c in enumerate(chunks)]
Dchunks = [Document(page_content=c.page_content, metadata={**c.metadata, 'chunk_index': i}) for i, c in enumerate(chunks)]
Attempts:
2 left
💡 Hint
Think about preserving original metadata and adding new keys immutably.