0
0
LangChainframework~10 mins

RecursiveCharacterTextSplitter in LangChain - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - RecursiveCharacterTextSplitter
Input: Large Text
Check if text length <= chunk_size?
YesReturn text as chunk
No
Find separator from list in text
Split text by separator
For each part
Recursively call splitter on part
Collect chunks
Return all chunks
The splitter checks if text is small enough; if not, it finds a separator to split text and recursively splits each part until chunks are small.
Execution Sample
LangChain
splitter = RecursiveCharacterTextSplitter(chunk_size=10, separators=["\n", " ", ""])
chunks = splitter.split_text("Hello world! This is a test.")
Splits the input text into chunks of max 10 characters using newline, space, or character splits recursively.
Execution Table
StepText InputCondition (len <= 10?)Separator FoundActionChunks Collected
1"Hello world! This is a test."False (28 > 10)"\n" not found, " " foundSplit by space[]
2"Hello"True (5 <= 10)N/AReturn ["Hello"]["Hello"]
3"world!"True (6 <= 10)N/AReturn ["world!"]["Hello", "world!"]
4"This"True (4 <= 10)N/AReturn ["This"]["Hello", "world!", "This"]
5"is"True (2 <= 10)N/AReturn ["is"]["Hello", "world!", "This", "is"]
6"a"True (1 <= 10)N/AReturn ["a"]["Hello", "world!", "This", "is", "a"]
7"test."True (5 <= 10)N/AReturn ["test."]["Hello", "world!", "This", "is", "a", "test."]
8All parts processedN/AN/AReturn combined chunks["Hello", "world!", "This", "is", "a", "test."]
💡 All text parts are smaller than or equal to chunk_size, recursion ends.
Variable Tracker
VariableStartAfter Step 1After Step 2-7Final
text"Hello world! This is a test.""Hello world! This is a test."Each part of split textN/A
chunks[][]["Hello", "world!", "This", "is", "a", "test."]["Hello", "world!", "This", "is", "a", "test."]
Key Moments - 3 Insights
Why does the splitter try multiple separators instead of just splitting by space?
The splitter tries separators in order to split text into meaningful chunks. If a separator is not found, it tries the next one. This is shown in Step 1 where newline is not found but space is used.
What happens when a text part is smaller than chunk_size?
When text length is less or equal to chunk_size, the splitter returns it as a chunk without further splitting, as seen in Steps 2-7.
Why is recursion needed here?
Recursion allows splitting large text into smaller parts step-by-step until all chunks fit the size limit, demonstrated by splitting each part recursively.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the chunk collected after processing the text "is"?
A["is"]
B["a"]
C["This"]
D["test."]
💡 Hint
Check Step 5 in the execution_table under Chunks Collected.
At which step does the splitter find the first separator to split the text?
AStep 2
BStep 1
CStep 8
DStep 4
💡 Hint
Look at Step 1 in execution_table where separator " " is found.
If chunk_size was increased to 30, how would the execution table change?
AThe splitter would split text into more chunks.
BThe splitter would not find any separator.
CThe splitter would return the whole text as one chunk at Step 1.
DThe recursion would go deeper.
💡 Hint
Refer to Condition column in Step 1 and chunk_size comparison.
Concept Snapshot
RecursiveCharacterTextSplitter splits large text into smaller chunks.
It tries separators in order to split text.
If text is small enough, returns it as a chunk.
Otherwise, splits recursively until all chunks fit size.
Useful for processing long texts in manageable pieces.
Full Transcript
The RecursiveCharacterTextSplitter takes a large text and splits it into smaller chunks based on a maximum chunk size. It checks if the text length is less than or equal to the chunk size. If yes, it returns the text as a chunk. If not, it tries to find a separator from a list to split the text. It splits the text by the first found separator and recursively applies the same process to each part. This continues until all chunks are small enough. For example, splitting "Hello world! This is a test." with chunk size 10 and separators newline, space, and empty string results in chunks like "Hello", "world!", "This", "is", "a", "test.". This method helps break down large texts into manageable pieces for further processing.