LangChainframework~10 mins

Token-based splitting in LangChain - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Perf

Concept Flow - Token-based splitting

Input Text

↓

Initialize Tokenizer

↓

Split Text into Tokens

↓

Group Tokens into Chunks

↓

Return List of Text Chunks

The text is first tokenized, then tokens are grouped into chunks, and finally the chunks are returned as split text.

Execution Sample

LangChain

from langchain.text_splitter import TokenTextSplitter

text = "Hello world! This is a test of token splitting."
splitter = TokenTextSplitter(chunk_size=5)
chunks = splitter.split_text(text)
print(chunks)

This code splits the input text into chunks of 5 tokens each using Langchain's TokenTextSplitter.

Execution Table

Step	Action	Tokens Processed	Current Chunk Tokens	Chunks Formed
1	Initialize tokenizer and input text	0
2	Tokenize text	0
3	Add token 'Hello' to current chunk	1	Hello
4	Add token 'world' to current chunk	2	Hello, world
5	Add token '!' to current chunk	3	Hello, world, !
6	Add token 'This' to current chunk	4	Hello, world, !, This
7	Add token 'is' to current chunk	5	Hello, world, !, This, is
8	Current chunk reached chunk_size=5, save chunk	5		Hello world! This is
9	Add token 'a' to new chunk	6	a	Hello world! This is
10	Add token 'test' to current chunk	7	a, test	Hello world! This is
11	Add token 'of' to current chunk	8	a, test, of	Hello world! This is
12	Add token 'token' to current chunk	9	a, test, of, token	Hello world! This is
13	Add token 'splitting' to current chunk	10	a, test, of, token, splitting	Hello world! This is
14	Current chunk reached chunk_size=5, save chunk	10		Hello world! This is, a test of token splitting
15	Add token '.' to new chunk	11	.	Hello world! This is, a test of token splitting
16	End of tokens, save last chunk	11		Hello world! This is, a test of token splitting, .
17	Return all chunks	11		Hello world! This is, a test of token splitting, .

💡 All tokens processed and chunks returned.

Variable Tracker

Variable	Start	After Step 8	After Step 14	After Step 16	Final
tokens_processed	0	5	10	11	11
current_chunk_tokens
chunks_formed		Hello world! This is	Hello world! This is, a test of token splitting	Hello world! This is, a test of token splitting, .	Hello world! This is, a test of token splitting, .

Key Moments - 2 Insights

Why does the current chunk reset after reaching chunk_size?

What happens to leftover tokens that don't fill a full chunk?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table at step 7. How many tokens are in the current chunk?

Concept Snapshot

Token-based splitting:
- Input text is split into tokens.
- Tokens are grouped into chunks of fixed size.
- Each chunk is joined back to text.
- Leftover tokens form a smaller last chunk.
- Useful for processing text in manageable pieces.

Full Transcript

Token-based splitting takes a long text and breaks it into smaller parts based on tokens. First, the text is split into tokens using a tokenizer. Then tokens are collected into groups called chunks, each with a fixed number of tokens. When a chunk reaches the set size, it is saved and a new chunk starts. If tokens remain at the end that don't fill a full chunk, they form a smaller chunk. This method helps handle large texts by splitting them into smaller, manageable pieces for processing.