0
0
Prompt Engineering / GenAIml~12 mins

Caching strategies for LLMs in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Caching strategies for LLMs

This pipeline shows how caching helps large language models (LLMs) work faster by saving and reusing parts of their work instead of repeating it.

Data Flow - 7 Stages
1Input Text
1 prompt stringUser provides a text prompt to the LLM1 prompt string
"What is the weather today?"
2Tokenization
1 prompt stringConvert text into tokens (small pieces)1 prompt token list (e.g., 6 tokens)
["What", "is", "the", "weather", "today", "?"]
3Cache Lookup
1 prompt token listCheck if tokens or partial results are already saved in cacheCache hit or miss with cached token embeddings or empty
Cache hit for tokens ["What", "is"]
4Embedding Computation
Tokens not in cache (e.g., 4 tokens)Compute token embeddings for new tokensEmbedding vectors for new tokens (e.g., 4 vectors)
Computed embeddings for ["the", "weather", "today", "?"]
5Cache Update
New token embeddingsSave new embeddings into cache for future reuseUpdated cache with new embeddings
Cache now stores embeddings for ["What", "is", "the", "weather", "today", "?"]
6Model Inference
Full token embeddings (cached + new)Run LLM layers to generate output tokensOutput token probabilities
Model predicts next word probabilities
7Output Generation
Output token probabilitiesConvert probabilities to text tokens and joinGenerated text string
"It is sunny today."
Training Trace - Epoch by Epoch

Loss
2.5 |****
2.0 |*** 
1.5 |**  
1.0 |*   
0.5 |    
     +----
      1 2 3 4 5 Epochs
EpochLoss ↓Accuracy ↑Observation
12.30.15Initial training with high loss and low accuracy
21.80.30Loss decreased, accuracy improved as model learns
31.40.45Continued improvement in loss and accuracy
41.10.60Model converging, caching helps speed training
50.90.70Stable decrease in loss, accuracy rising steadily
Prediction Trace - 6 Layers
Layer 1: Tokenization
Layer 2: Cache Lookup
Layer 3: Embedding Computation
Layer 4: Cache Update
Layer 5: Model Inference
Layer 6: Output Generation
Model Quiz - 3 Questions
Test your understanding
What is the main benefit of using cache in LLMs?
AMakes the model forget old data
BSpeeds up processing by reusing previous computations
CIncreases the size of the model
DChanges the model architecture
Key Insight
Caching in LLMs saves time by storing and reusing token embeddings. This reduces repeated work during both training and prediction, making the model faster without changing its accuracy.