In debate and consensus patterns, the key goal is to combine multiple opinions or models to reach a reliable final decision. Metrics that measure agreement and correctness matter most.
Accuracy shows how often the final consensus matches the true answer.
Precision and Recall help understand if the consensus is correctly identifying positive cases without missing or wrongly adding them.
F1 score balances precision and recall, useful when both false positives and false negatives matter.
Agreement metrics like Cohen's Kappa or Fleiss' Kappa measure how much the individual debaters agree beyond chance, showing the strength of consensus.