PyTorchml~8 mins

Image generation basics in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Image generation basics

Which metric matters for Image Generation and WHY

For image generation, we want to know how close the generated images are to real ones. Common metrics are:

Inception Score (IS): Measures if images look clear and varied.
Fréchet Inception Distance (FID): Compares statistics of generated and real images; lower is better.
Perceptual Loss: Measures how similar images are in features humans notice.

These metrics help us understand if the model creates realistic and diverse images.

Confusion Matrix or Equivalent Visualization

Image generation does not use a confusion matrix like classification. Instead, we compare distributions of features from real and generated images.

Real Images Features:  Mean = μ_r, Covariance = Σ_r
Generated Images Features: Mean = μ_g, Covariance = Σ_g

FID = ||μ_r - μ_g||^2 + Trace(Σ_r + Σ_g - 2(Σ_r Σ_g)^(1/2))

Lower FID means generated images are closer to real ones.

Tradeoff: Diversity vs Quality

In image generation, we want images that are both diverse and high quality.

High Quality, Low Diversity: Images look very clear but all look similar (mode collapse).
High Diversity, Low Quality: Images vary a lot but look blurry or unrealistic.

Metrics like Inception Score reward both diversity and quality, while FID focuses on similarity to real images.

Good vs Bad Metric Values for Image Generation

Inception Score (IS): Good: > 8 (varied, clear images). Bad: < 3 (blurry, repetitive images).
FID: Good: < 50 (close to real images). Bad: > 100 (poor quality or unrealistic images).

Good metrics mean the model generates images that look real and different from each other.

Common Pitfalls in Image Generation Metrics

Mode Collapse: Model generates few types of images, hurting diversity but may show good quality.
Overfitting: Model memorizes training images, leading to low FID but poor generalization.
Misleading Scores: IS can be high if images are sharp but unrealistic; always check images visually.
Data Leakage: Using test images in training can falsely improve metrics.

Self-Check Question

Your image generation model has an Inception Score of 9 but a FID of 120. Is it good?

Answer: No. The high IS means images look clear and varied, but the very high FID means they are far from real images in distribution. The model likely generates unrealistic images despite variety. You should improve realism.

Key Result

Image generation quality is best judged by FID (lower is better) and Inception Score (higher is better) balancing realism and diversity.