0
0
Prompt Engineering / GenAIml~12 mins

Text chunking strategies in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Text chunking strategies

This pipeline breaks long text into smaller, manageable pieces called chunks. These chunks help AI models understand and process text better by focusing on smaller parts at a time.

Data Flow - 5 Stages
1Input Text
1 document x 1000 wordsReceive raw text document1 document x 1000 words
"Once upon a time in a faraway land, there was a small village surrounded by mountains..."
2Preprocessing
1 document x 1000 wordsClean text (remove punctuation, lowercase)1 document x 980 words
"once upon a time in a faraway land there was a small village surrounded by mountains"
3Chunking
1 document x 980 wordsSplit text into chunks of 100 words with 20 words overlap12 chunks x 100 words each
Chunk 1: words 1-100, Chunk 2: words 81-180, ..., Chunk 12: words 881-980
4Feature Extraction
12 chunks x 100 wordsConvert each chunk into numerical vectors (embeddings)12 chunks x 100-dimensional vectors
[0.12, 0.45, ..., 0.33] for chunk 1
5Model Input
12 chunks x 100-dimensional vectorsFeed chunks into AI model for understanding or generationModel processes each chunk independently
Model predicts sentiment or answers questions per chunk
Training Trace - Epoch by Epoch
Loss
1.0 |****
0.8 |*** 
0.6 |**  
0.4 |*   
0.2 |    
0.0 +----
     1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
10.850.60Model starts learning from chunked text with moderate accuracy
20.650.72Loss decreases and accuracy improves as model adapts to chunked inputs
30.500.80Model shows good understanding of chunks, accuracy rising
40.400.85Training converges with lower loss and higher accuracy
50.350.88Final epoch shows stable performance on chunked text
Prediction Trace - 4 Layers
Layer 1: Input Chunk
Layer 2: Embedding Layer
Layer 3: Model Processing
Layer 4: Output Aggregation
Model Quiz - 3 Questions
Test your understanding
Why do we use overlapping words between chunks?
ATo keep context between chunks
BTo reduce the total number of chunks
CTo make chunks shorter
DTo remove stop words
Key Insight
Breaking long text into overlapping chunks helps AI models keep context and understand text better. Training shows that chunking improves learning by making text easier to process in parts.