For Top-p and Top-k sampling, the key metric is perplexity. Perplexity measures how well the language model predicts the next word. Lower perplexity means the model is more confident and accurate in its predictions.
Additionally, diversity metrics like distinct-n (unique n-grams) help measure how varied the generated text is. Top-p and Top-k control randomness, so balancing perplexity and diversity is important.