Bird
0
0

Given this code snippet, what will be the output chunk sizes?

medium📝 component behavior Q4 of 15
LangChain - Text Splitting
Given this code snippet, what will be the output chunk sizes?
splitter = RecursiveCharacterTextSplitter(chunk_size=10, chunk_overlap=2, separators=[])
text = "Hello world! This is langchain"
chunks = splitter.split_text(text)
print([len(chunk) for chunk in chunks])
A[12, 12, 12]
B[10, 10, 10, 6]
C[10, 8, 10, 6]
D[8, 8, 8, 8]
Step-by-Step Solution
Solution:
  1. Step 1: Understand chunk_size and chunk_overlap

    chunk_size=10 means max 10 chars per chunk; chunk_overlap=2 means 2 chars overlap between chunks.
  2. Step 2: Calculate chunks from text length 30

    With separators=[], it falls back to fixed-size splitting: first 10 chars, next 10 with 2 overlap, next 10 with 2 overlap, last 6 chars remain.
  3. Final Answer:

    [10, 10, 10, 6] -> Option B
  4. Quick Check:

    Chunk sizes = [10, 10, 10, 6] [OK]
Quick Trick: Chunk size limits length; overlap repeats last chars [OK]
Common Mistakes:
  • Ignoring overlap in chunk sizes
  • Assuming equal chunk sizes always
  • Miscounting text length

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More LangChain Quizzes