0
0
Prompt Engineering / GenAIml~12 mins

Text splitters in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Text splitters

This pipeline breaks long text into smaller pieces so a model can understand and work with it better. It splits text by sentences or paragraphs before processing.

Data Flow - 4 Stages
1Raw Text Input
1 document x 1000 wordsReceive full text document1 document x 1000 words
"Today is sunny. We will go to the park. Then have lunch."
2Sentence Splitter
1 document x 1000 wordsSplit text into sentences using punctuation3 sentences x variable length
["Today is sunny.", "We will go to the park.", "Then have lunch."]
3Paragraph Splitter
1 document x 1000 wordsSplit text into paragraphs by newline characters3 paragraphs x variable length
["Today is sunny. We will go to the park.", "Then have lunch.", "After that, we will read books."]
4Chunk Creation
10 sentences or 3 paragraphsGroup sentences or paragraphs into chunks of max 100 words5 chunks x up to 100 words each
["Today is sunny. We will go to the park.", "Then have lunch. After that, we will read books."]
Training Trace - Epoch by Epoch

Loss
0.5 |****
0.4 |*** 
0.3 |**  
0.2 |*   
0.1 |    
     +----
      1 2 3 4 Epochs
EpochLoss ↓Accuracy ↑Observation
10.450.60Initial split quality is moderate, some sentences split incorrectly.
20.300.75Improved splitting rules reduce errors, better sentence boundaries.
30.200.85Splitting is mostly correct, chunk sizes optimized for model input.
40.150.90Final tuning reduces overlap and preserves context well.
Prediction Trace - 3 Layers
Layer 1: Input Raw Text
Layer 2: Sentence Splitter
Layer 3: Chunk Creation
Model Quiz - 3 Questions
Test your understanding
What is the main purpose of the sentence splitter stage?
ATo combine sentences into paragraphs
BTo break text into smaller sentences for easier processing
CTo remove punctuation from the text
DTo translate text into another language
Key Insight
Splitting text into smaller, meaningful pieces helps models understand context better and improves processing efficiency. Training improves the splitting rules to reduce errors and optimize chunk sizes.