0
0
Prompt Engineering / GenAIml~12 mins

Text-to-image prompt crafting in Prompt Engineering / GenAI - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Text-to-image prompt crafting

This pipeline shows how a text description is turned into an image by a text-to-image AI model. It starts with the text input, processes it into features, generates an image step-by-step, and outputs the final picture.

Data Flow - 5 Stages
1Text Input
1 text stringUser writes a descriptive sentence or phrase1 text string
"A cute brown puppy playing in a green park on a sunny day"
2Text Tokenization
1 text stringSplit text into smaller pieces called tokens1 sequence of tokens (e.g., 13 tokens)
["A", "cute", "brown", "puppy", "playing", "in", "a", "green", "park", "on", "a", "sunny", "day"]
3Text Embedding
1 sequence of tokensConvert tokens into numbers that represent meaning1 sequence of vectors (e.g., 13 vectors of size 768)
[[0.12, -0.05, ...], [0.33, 0.44, ...], ...]
4Conditioning the Image Generator
1 sequence of vectorsUse text vectors to guide image creationConditioned latent space ready for image generation
Latent vectors influenced by "puppy", "park", "sunny"
5Image Generation
Conditioned latent spaceGenerate image pixels step-by-step using a diffusion or transformer model1 image (e.g., 512 x 512 pixels x 3 color channels)
Image showing a brown puppy in a green park
Training Trace - Epoch by Epoch

Loss
2.5 |***************
2.0 |**********
1.5 |*******
1.0 |****
0.5 |**
0.0 +----------------
     1  5 10 15 20 Epochs
EpochLoss ↓Accuracy ↑Observation
12.50.10Model starts learning basic text-image connections
51.80.35Model improves understanding of objects and colors
101.20.55Better image details and text alignment
150.80.70Clearer images, more accurate to prompts
200.50.85High quality images matching text well
Prediction Trace - 4 Layers
Layer 1: Text Tokenization
Layer 2: Text Embedding
Layer 3: Conditioning Image Generator
Layer 4: Image Generation
Model Quiz - 3 Questions
Test your understanding
What is the first step the model takes after receiving the text prompt?
AConverting image to text
BGenerating the image pixels
CSplitting the text into tokens
DApplying color filters
Key Insight
Text-to-image models work by turning words into numbers that guide image creation step-by-step. Training improves the model’s ability to match images closely to the text, shown by decreasing loss and increasing accuracy.