Challenge - 5 Problems
Metadata Mastery in LangChain
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ component_behavior
intermediate2:00remaining
What happens to metadata after splitting?
Given a document with metadata, what will be the metadata of the first chunk after splitting using a typical LangChain text splitter?
LangChain
from langchain.schema import Document from langchain.text_splitter import RecursiveCharacterTextSplitter doc = Document(page_content="Hello world! This is a test.", metadata={"source": "test_source", "page": 1}) splitter = RecursiveCharacterTextSplitter(chunk_size=12, chunk_overlap=0) chunks = splitter.split_documents([doc]) first_chunk_metadata = chunks[0].metadata
Attempts:
2 left
💡 Hint
Think about whether the splitter copies metadata or discards it.
✗ Incorrect
LangChain's RecursiveCharacterTextSplitter preserves the metadata of the original document in each chunk it creates. So each chunk retains the same metadata dictionary.
❓ state_output
intermediate2:00remaining
How many chunks retain metadata after splitting?
If you split a single document with metadata into 3 chunks using LangChain's RecursiveCharacterTextSplitter, how many chunks will have the original metadata preserved?
LangChain
from langchain.schema import Document from langchain.text_splitter import RecursiveCharacterTextSplitter doc = Document(page_content="A long text that will be split into three parts.", metadata={"author": "Alice"}) splitter = RecursiveCharacterTextSplitter(chunk_size=20, chunk_overlap=0) chunks = splitter.split_documents([doc]) count_with_metadata = sum(1 for c in chunks if c.metadata == {"author": "Alice"})
Attempts:
2 left
💡 Hint
Consider if metadata is copied or removed during splitting.
✗ Incorrect
Each chunk created by the splitter retains the original document's metadata, so all 3 chunks have the metadata.
📝 Syntax
advanced2:30remaining
Which code correctly preserves metadata during splitting?
Which of the following code snippets correctly preserves the metadata of documents after splitting using LangChain's RecursiveCharacterTextSplitter?
Attempts:
2 left
💡 Hint
Look for the method that returns Document objects with metadata.
✗ Incorrect
split_documents returns Document objects preserving metadata. split_text returns strings without metadata, so manual wrapping is needed to preserve metadata.
🔧 Debug
advanced2:00remaining
Why is metadata lost after splitting?
A developer uses splitter.split_text(doc.page_content) and then creates new Document objects from the chunks but forgets to assign metadata. What is the result for metadata in the chunks?
LangChain
chunks_text = splitter.split_text(doc.page_content) chunks = [Document(page_content=chunk) for chunk in chunks_text] metadata_list = [chunk.metadata for chunk in chunks]
Attempts:
2 left
💡 Hint
Check what happens if metadata is not passed when creating Document objects.
✗ Incorrect
If metadata is not passed, Document objects have empty metadata dictionaries by default.
🧠 Conceptual
expert3:00remaining
How to ensure custom metadata updates during splitting?
You want to split documents but also add a new metadata field 'chunk_index' to each chunk indicating its order. Which approach correctly achieves this while preserving original metadata?
LangChain
from langchain.schema import Document from langchain.text_splitter import RecursiveCharacterTextSplitter doc = Document(page_content="Some long text here.", metadata={"source": "book"}) splitter = RecursiveCharacterTextSplitter(chunk_size=10, chunk_overlap=0) chunks = splitter.split_documents([doc]) # Add chunk_index metadata here
Attempts:
2 left
💡 Hint
Think about preserving original metadata and adding new keys immutably.
✗ Incorrect
Option D creates new Document objects merging original metadata with the new 'chunk_index' key, preserving all original metadata fields.