Stable Diffusion is a model that creates images from text. To check how well it works, we look at how realistic and relevant the images are. Common metrics include FID (Fréchet Inception Distance) which measures how close the generated images are to real ones, and CLIP score which checks if the image matches the text description. These metrics matter because they tell us if the images look good and fit the text prompt.
Stable Diffusion overview in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Stable Diffusion does not use a confusion matrix because it is a generative model, not a classifier. Instead, we use visual examples and scores like FID and CLIP to evaluate quality.
Example FID scores:
Real images vs Generated images
Lower FID = better quality
Example CLIP score:
Text prompt: "A cat sitting on a chair"
Generated image matches prompt well = High CLIP score
For Stable Diffusion, precision means how clear and detailed the images are, while recall means how diverse and varied the images can be for the same prompt.
If the model focuses too much on precision, images look sharp but may be very similar (low diversity). If it focuses on recall, images vary a lot but may be blurry or less accurate.
Example: For a prompt "a red apple", high precision means every apple looks very realistic and red. High recall means apples might look different shapes or styles but still red.
Good: FID below 30 means generated images are close to real images. CLIP score above 0.3 means images match text well. Images look sharp, colorful, and relevant.
Bad: FID above 100 means images look very different from real ones. CLIP score below 0.1 means images do not match the prompt. Images may be blurry, strange, or unrelated.
- Overfitting: Model may memorize training images, producing less diverse outputs.
- Data leakage: Using test images in training can falsely improve metrics.
- Ignoring diversity: Only checking image quality but not variety can mislead about model performance.
- Misinterpreting metrics: Low FID alone does not guarantee good text-image match; use CLIP score too.
Your Stable Diffusion model has a FID of 25 but a CLIP score of 0.05. Is it good?
Answer: No, because while the images look realistic (low FID), they do not match the text prompts well (very low CLIP score). The model needs improvement to better understand and generate images that fit the text.
Practice
Solution
Step 1: Understand Stable Diffusion's function
Stable Diffusion is designed to generate images based on text prompts.Step 2: Compare with other options
Other options describe different AI tasks unrelated to image generation.Final Answer:
To create images from text descriptions -> Option CQuick Check:
Stable Diffusion = image generation from text [OK]
- Confusing Stable Diffusion with language translation
- Thinking it analyzes data instead of creating images
- Mixing it up with spam detection tools
Solution
Step 1: Identify proper prompt format
Stable Diffusion accepts text prompts as strings describing the image.Step 2: Check options for correct syntax
Only"A sunny beach with palm trees"uses a simple text string suitable as a prompt.Final Answer:
"A sunny beach with palm trees" -> Option AQuick Check:
Prompt = plain text string [OK]
- Using code-like syntax instead of plain text
- Omitting quotes around the prompt
- Mixing function calls with prompt text
"A cat sitting on a red chair", what kind of output should Stable Diffusion produce?Solution
Step 1: Understand prompt to output relation
Stable Diffusion generates images based on text prompts.Step 2: Match prompt to output type
The prompt describes a scene; the output is an image of that scene.Final Answer:
An image showing a cat sitting on a red chair -> Option BQuick Check:
Text prompt -> image output [OK]
- Expecting text output instead of image
- Confusing image generation with video creation
- Thinking it lists information instead of creating visuals
"A futuristic cityscape at night" but the output image is blurry and unclear. What is a likely cause?Solution
Step 1: Analyze prompt clarity impact
Simple or vague prompts can cause unclear images because the model lacks detail to generate sharp visuals.Step 2: Evaluate other options
Stable Diffusion supports night scenes and color images; prompt length is not the main issue here.Final Answer:
The prompt was too simple or vague -> Option DQuick Check:
Clear prompts = better images [OK]
- Assuming model can't create night scenes
- Thinking Stable Diffusion only makes black and white images
- Blaming prompt length instead of prompt detail
Solution
Step 1: Understand prompt specificity effect
Adding more descriptive details helps the model focus on the correct colors and objects.Step 2: Evaluate other options
Shorter or vague prompts reduce clarity; changing models unnecessarily or removing color words won't fix the color issue.Final Answer:
Add more detail to the prompt like "a bright red apple on a rustic wooden table" -> Option AQuick Check:
Detailed prompts improve image accuracy [OK]
- Using vague or too short prompts
- Ignoring color details in the prompt
- Switching models without reason
