This flow shows how large text is split into smaller chunks with overlaps to keep context between chunks.
Execution Sample
LangChain
text = "The quick brown fox jumps over the lazy dog"
chunk_size = 10
overlap = 3
chunks = []
start = 0while start < len(text):
end = start + chunk_size
chunk = text[start:end]
chunks.append(chunk)
start += chunk_size - overlap
This code splits a sentence into chunks of 10 characters with 3 characters overlapping between chunks.
Execution Table
Step
start
end
chunk extracted
Action
Next start
1
0
10
"The quick"
Extract chunk from 0 to 10
7
2
7
17
"ck brown f"
Extract chunk from 7 to 17
14
3
14
24
"own fox ju"
Extract chunk from 14 to 24
21
4
21
31
"x jumps ov"
Extract chunk from 21 to 31
28
5
28
38
"ps over th"
Extract chunk from 28 to 38
35
6
35
43
"e lazy dog"
Extract chunk from 35 to 43 (end of text)
42
7
42
52
"g"
Extract chunk from 42 to 43 (end of text)
49
8
49
59
""
Start 49 beyond text length, stop loop
-
💡 start index 49 is beyond text length 43, loop ends
Variable Tracker
Variable
Start
After 1
After 2
After 3
After 4
After 5
After 6
After 7
Final
start
0
7
14
21
28
35
42
49
49
end
10
17
24
31
38
43
52
59
59
chunk
"The quick"
"ck brown f"
"own fox ju"
"x jumps ov"
"ps over th"
"e lazy dog"
"g"
""
-
chunks.length
0
1
2
3
4
5
6
7
7
Key Moments - 3 Insights
Why does the start index increase by chunk_size - overlap instead of chunk_size?
Because we want chunks to overlap, so we move start forward by less than chunk_size to keep some text from the previous chunk. See execution_table rows where start moves 0->7->14 instead of 0->10->20.
What happens when the end index goes beyond the text length?
The chunk extracts only up to the text end without error. In the last chunk, end is beyond text length, so chunk extracts till the end. See execution_table rows 6 and 7.
Why does the loop stop when start is beyond text length?
Because no more chunks can be extracted. The condition start < len(text) fails at step 8 when start is 49 and text length is 43. See exit_note.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the chunk extracted at step 3?
A"x jumps ov"
B"own fox ju"
C"ck brown f"
D"ps over th"
💡 Hint
Check the 'chunk extracted' column at step 3 in the execution_table.
At which step does the start index first become greater than or equal to the text length?
AStep 8
BStep 7
CStep 6
DStep 5
💡 Hint
Look at the 'start' column in variable_tracker and execution_table rows.
If overlap was set to 0, how would the start index change between steps?
Astart would increase by chunk_size - 1 each step
Bstart would stay the same
Cstart would increase by chunk_size each step
Dstart would increase by overlap each step
💡 Hint
Overlap controls how much start moves back; zero overlap means no overlap, so start moves full chunk_size.
Concept Snapshot
Overlap and chunk boundaries split large text into smaller pieces.
Chunks have fixed size with some overlap to keep context.
Start index moves by chunk_size minus overlap.
Last chunk may be shorter if text ends.
Overlap helps in tasks like text processing to avoid losing info.
Full Transcript
This visual execution shows how a long text is split into smaller chunks with overlaps. We start at index 0 and extract a chunk of fixed size. Then we move the start index forward by chunk_size minus overlap to create the next chunk. This overlap keeps some text from the previous chunk to maintain context. The process repeats until the start index goes beyond the text length. The last chunk may be shorter if the end index exceeds text length. This method is useful in text processing frameworks like Langchain to handle large documents in manageable pieces while preserving continuity.