ML Pythonml~8 mins

Fairness metrics in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Fairness metrics

Which metric matters for fairness and WHY

Fairness metrics help us check if a model treats different groups equally. We want to avoid bias that can hurt some people unfairly. Common fairness metrics include Demographic Parity (checks if positive outcomes are equal across groups), Equal Opportunity (checks if true positive rates are equal), and Equalized Odds (checks if both true positive and false positive rates are equal). These metrics matter because they show if the model is fair in decisions, not just accurate.

Confusion matrix example for fairness

Imagine a model predicting loan approval for two groups: Group A and Group B.

    Group A Confusion Matrix:
      TP=40  FP=10
      FN=5   TN=45

    Group B Confusion Matrix:
      TP=30  FP=20
      FN=15  TN=35

We calculate True Positive Rate (Recall) for each group:

Group A Recall = 40 / (40 + 5) = 0.89
Group B Recall = 30 / (30 + 15) = 0.67

The difference shows Group B gets fewer correct positive predictions, indicating possible unfairness.

Tradeoff: Fairness vs Accuracy

Sometimes improving fairness can reduce overall accuracy. For example, if a model favors one group, it might be more accurate but unfair. Adjusting the model to treat groups equally might lower accuracy but is fairer. This tradeoff is important to understand and balance based on the problem.

Example: A hiring model might be very accurate but reject many qualified candidates from a minority group. Improving fairness means accepting more candidates from that group, which might slightly reduce accuracy but improves fairness.

What "good" vs "bad" fairness metrics look like

Good: Similar true positive rates and false positive rates across groups (e.g., TPR difference < 0.05). This means the model treats groups equally.

Bad: Large differences in metrics (e.g., one group has TPR 0.9, another 0.5). This means the model favors one group and is unfair.

Common pitfalls in fairness metrics

Ignoring context: Fairness depends on the problem and groups; one metric does not fit all.
Data imbalance: Small group sizes can make metrics unstable or misleading.
Accuracy paradox: A model can be accurate but unfair if it ignores minority groups.
Overfitting fairness: Adjusting too much for fairness on training data can hurt real-world performance.
Ignoring multiple fairness aspects: Focusing on one metric may hide unfairness in others.

Self-check question

Your model has 95% accuracy overall but the true positive rate for a minority group is 40%, while for the majority group it is 85%. Is this model good for fairness? Why or why not?

Answer: No, this model is not fair. Even though accuracy is high, the minority group has a much lower true positive rate, meaning they get fewer correct positive predictions. This shows bias and unfair treatment.

Key Result

Fairness metrics measure if a model treats different groups equally by comparing rates like true positive rate across groups.