In models with a generator and discriminator, like GANs, the main goal is to make the generator create data that looks real, and the discriminator to tell real from fake. The key metric is loss for both parts. The generator's loss shows how well it fools the discriminator. The discriminator's loss shows how well it spots fakes. We also look at Inception Score or FID to measure how real the generated data looks overall.
Generator and discriminator in PyTorch - Model Metrics & Evaluation
| Predicted Real | Predicted Fake |
---------------------------------------
Real Data | TP | FN |
Fake Data | FP | TN |
TP = Discriminator correctly says real data is real
TN = Discriminator correctly says fake data is fake
FP = Discriminator wrongly says fake data is real
FN = Discriminator wrongly says real data is fake
Precision = TP / (TP + FP) measures how many predicted real are truly real.
Recall = TP / (TP + FN) measures how many real data are correctly found.
These help understand discriminator quality.
For the discriminator, high precision means it rarely mistakes fake data as real. High recall means it finds most real data correctly. If precision is too high but recall low, it may miss some real data. If recall is high but precision low, it may accept many fakes as real.
For the generator, the goal is to reduce discriminator's ability to tell fake from real, so its loss improves when discriminator's precision and recall drop.
Example: In art generation, a generator with low recall on real art means it misses many styles. A discriminator with low precision may accept poor fakes as real.
- Good Discriminator: Balanced precision and recall around 0.8-0.9, showing it can spot fakes and recognize real data well.
- Bad Discriminator: Precision or recall below 0.5 means it guesses poorly, hurting training.
- Good Generator: Generator loss decreases steadily, and FID score lowers (closer to 0), meaning generated data looks more real.
- Bad Generator: Generator loss stays high or oscillates, FID score high, meaning poor quality fake data.
- Mode Collapse: Generator produces limited variety, fooling discriminator but failing diversity. Metrics may look good but output is poor.
- Overfitting Discriminator: Discriminator becomes too strong, generator can't learn. Losses become unstable.
- Ignoring Diversity Metrics: Only tracking loss misses if generator creates varied outputs.
- Data Leakage: Using test data in training can inflate metrics falsely.
Your GAN model has a discriminator accuracy of 98% but the generator's FID score is very high, and generated images look very similar. Is this good?
Answer: No. The discriminator is too strong and the generator is not learning well. The high discriminator accuracy means it easily spots fakes. The high FID and similar images show mode collapse. You need to balance training and monitor diversity.