For generative models creating visual content, the key metrics are Inception Score (IS) and Fréchet Inception Distance (FID). These metrics measure how realistic and diverse the generated images are. IS checks if images look like real objects and are varied. FID compares generated images to real ones to see how close they are in quality. These matter because visual content must look believable and not repetitive.
Why generative models create visual content in Computer Vision - Why Metrics Matter
Generative models don't use confusion matrices like classifiers. Instead, we visualize results with example images and metric scores.
Real Images: [Cat, Dog, Car, Tree]
Generated Images: [Cat-like, Dog-like, Car-like, Tree-like]
Inception Score: 7.5 (higher is better, max ~10)
FID Score: 25.0 (lower is better, 0 is perfect)
This shows how close generated images are to real ones in quality and variety.
In generative visuals, precision means how many generated images look real and sharp. Recall means how many different types of images the model can create.
Example: A model that only creates perfect cats has high precision but low recall (no dogs or cars). Another model creates many types but some look blurry, so recall is high but precision is low.
Good models balance both: images look real and cover many categories.
Good values:
- Inception Score (IS) above 7 means images are realistic and varied.
- FID below 30 means generated images are close to real images.
Bad values:
- IS below 3 means images are poor quality or repetitive.
- FID above 100 means images look very different from real ones.
Common pitfalls include:
- Mode collapse: Model generates only a few images repeatedly, causing low diversity but possibly high precision.
- Overfitting: Model memorizes training images, so metrics look good but new images are not creative.
- Misleading IS: High IS can happen if images are sharp but unrealistic.
- Data leakage: Using test images in training can falsely improve metrics.
No, this means the model creates sharp images (high IS) but they are very different from real images (high FID). The images might look unrealistic or have artifacts. The model needs improvement to generate more realistic visuals.