Generative AI (GenAI) creates new content like text, images, code, or audio. To check if it works well, we use different metrics depending on the type of content.
For text, we look at perplexity (how well the model predicts words) and BLEU or ROUGE scores (how close generated text is to human examples).
For images, we use FID (Fréchet Inception Distance) to measure how similar generated images are to real ones.
For code, correctness matters most. We check if generated code runs without errors and passes tests.
For audio, we measure quality with Mean Opinion Score (MOS) or signal similarity metrics.
Overall, the right metric depends on the content type and what matters most: quality, accuracy, or similarity to real data.