The RecursiveCharacterTextSplitter helps break long text into smaller pieces. It does this by splitting text step-by-step using different characters, making sure the pieces are easy to handle.
RecursiveCharacterTextSplitter in LangChain
from langchain.text_splitter import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""] ) chunks = text_splitter.split_text(long_text)
The chunk_size sets the maximum size of each text piece.
The chunk_overlap controls how much text overlaps between chunks to keep context.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) chunks = text_splitter.split_text(long_text)
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=300,
chunk_overlap=0,
separators=["\n", ".", " "]
)
chunks = text_splitter.split_text(long_text)text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, separators=["\n\n", "."]) chunks = text_splitter.split_text(long_text)
This program splits a text with three paragraphs into smaller chunks of max 50 characters. It overlaps 10 characters between chunks to keep context. It tries to split by paragraphs first, then sentences, then words.
from langchain.text_splitter import RecursiveCharacterTextSplitter long_text = ( "This is the first paragraph. It has two sentences.\n\n" "Here is the second paragraph! It also has sentences? Yes, it does.\n\n" "Finally, the third paragraph is here." ) text_splitter = RecursiveCharacterTextSplitter( chunk_size=50, chunk_overlap=10, separators=["\n\n", ".", "!", "?", ",", " ", ""] ) chunks = text_splitter.split_text(long_text) print("Number of chunks:", len(chunks)) for i, chunk in enumerate(chunks, 1): print(f"Chunk {i}:", repr(chunk))
The splitting tries separators in order, so order matters for best results.
Time complexity depends on text length and number of separators but is generally efficient for normal documents.
Common mistake: setting chunk_size too small can create many tiny chunks.
RecursiveCharacterTextSplitter breaks text into manageable chunks using multiple separators.
It keeps context by overlapping parts of chunks.
Useful for preparing text for language models or any tool with input size limits.