Text splitters break long text into smaller parts. The key metric is chunk quality, which means how well the text is split without losing meaning or context. We want splits that keep sentences whole and keep related ideas together. This helps models understand text better.
Text splitters in Prompt Engineering / GenAI - Model Metrics & Evaluation
Example of text splitter evaluation:
Original text length: 1000 characters
Split into chunks:
Chunk 1: 300 chars
Chunk 2: 350 chars
Chunk 3: 350 chars
Evaluation:
- Overlap between chunks: 20 chars (good for context)
- Sentence breaks inside chunks: 0 (ideal)
- Meaning preserved: 95% (human score)
No confusion matrix applies directly, but chunk overlap and sentence boundary accuracy are key.
For text splitters, think of precision as how often splits happen at the right place (not breaking sentences). Recall is how many important split points are found (like paragraph ends).
High precision, low recall: Splits only at perfect points but misses some natural breaks. Result: chunks may be too big.
High recall, low precision: Splits at many points, including bad ones. Result: chunks may be too small or cut sentences.
Good text splitters balance both to keep chunks meaningful and manageable.
- Good: Sentence boundary accuracy > 95%, chunk overlap 10-30 chars, chunk size consistent, meaning preserved > 90%
- Bad: Sentence breaks inside chunks > 20%, chunk overlap 0 or very large (losing context), chunks too uneven or too small, meaning preserved < 70%
- Ignoring sentence boundaries causes chunks that confuse models.
- Too little overlap loses context between chunks.
- Too much overlap wastes space and slows processing.
- Evaluating only chunk size without meaning can mislead.
- Using only automatic metrics without human checks misses quality issues.
Your text splitter creates chunks with 98% sentence boundary accuracy but only 10 characters overlap between chunks. Is this good?
Answer: It is mostly good because sentence boundaries are respected, which keeps meaning clear. However, 10 characters overlap might be too small to keep enough context between chunks. Increasing overlap slightly can help models understand connections better.