PyTorchml~8 mins

Generator and discriminator in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Generator and discriminator

Which metric matters for Generator and Discriminator and WHY

In models with a generator and discriminator, like GANs, the main goal is to make the generator create data that looks real, and the discriminator to tell real from fake. The key metric is loss for both parts. The generator's loss shows how well it fools the discriminator. The discriminator's loss shows how well it spots fakes. We also look at Inception Score or FID to measure how real the generated data looks overall.

Confusion Matrix for Discriminator

      | Predicted Real | Predicted Fake |
  ---------------------------------------
  Real Data  |      TP       |      FN       |
  Fake Data  |      FP       |      TN       |

  TP = Discriminator correctly says real data is real
  TN = Discriminator correctly says fake data is fake
  FP = Discriminator wrongly says fake data is real
  FN = Discriminator wrongly says real data is fake

Precision = TP / (TP + FP) measures how many predicted real are truly real.
Recall = TP / (TP + FN) measures how many real data are correctly found.
These help understand discriminator quality.

Precision vs Recall Tradeoff in Generator and Discriminator

For the discriminator, high precision means it rarely mistakes fake data as real. High recall means it finds most real data correctly. If precision is too high but recall low, it may miss some real data. If recall is high but precision low, it may accept many fakes as real.

For the generator, the goal is to reduce discriminator's ability to tell fake from real, so its loss improves when discriminator's precision and recall drop.

Example: In art generation, a generator with low recall on real art means it misses many styles. A discriminator with low precision may accept poor fakes as real.

Good vs Bad Metric Values for Generator and Discriminator

Good Discriminator: Balanced precision and recall around 0.8-0.9, showing it can spot fakes and recognize real data well.
Bad Discriminator: Precision or recall below 0.5 means it guesses poorly, hurting training.
Good Generator: Generator loss decreases steadily, and FID score lowers (closer to 0), meaning generated data looks more real.
Bad Generator: Generator loss stays high or oscillates, FID score high, meaning poor quality fake data.

Common Pitfalls in Metrics for Generator and Discriminator

Mode Collapse: Generator produces limited variety, fooling discriminator but failing diversity. Metrics may look good but output is poor.
Overfitting Discriminator: Discriminator becomes too strong, generator can't learn. Losses become unstable.
Ignoring Diversity Metrics: Only tracking loss misses if generator creates varied outputs.
Data Leakage: Using test data in training can inflate metrics falsely.

Self Check

Your GAN model has a discriminator accuracy of 98% but the generator's FID score is very high, and generated images look very similar. Is this good?

Answer: No. The discriminator is too strong and the generator is not learning well. The high discriminator accuracy means it easily spots fakes. The high FID and similar images show mode collapse. You need to balance training and monitor diversity.

Key Result

In generator-discriminator models, balanced discriminator precision and recall with improving generator loss and low FID score indicate good performance.