Fairness metrics help us check if a model treats different groups equally. We want to avoid bias that can hurt some people unfairly. Common fairness metrics include Demographic Parity (checks if positive outcomes are equal across groups), Equal Opportunity (checks if true positive rates are equal), and Equalized Odds (checks if both true positive and false positive rates are equal). These metrics matter because they show if the model is fair in decisions, not just accurate.
Fairness metrics in ML Python - Model Metrics & Evaluation
Imagine a model predicting loan approval for two groups: Group A and Group B.
Group A Confusion Matrix:
TP=40 FP=10
FN=5 TN=45
Group B Confusion Matrix:
TP=30 FP=20
FN=15 TN=35
We calculate True Positive Rate (Recall) for each group:
- Group A Recall = 40 / (40 + 5) = 0.89
- Group B Recall = 30 / (30 + 15) = 0.67
The difference shows Group B gets fewer correct positive predictions, indicating possible unfairness.
Sometimes improving fairness can reduce overall accuracy. For example, if a model favors one group, it might be more accurate but unfair. Adjusting the model to treat groups equally might lower accuracy but is fairer. This tradeoff is important to understand and balance based on the problem.
Example: A hiring model might be very accurate but reject many qualified candidates from a minority group. Improving fairness means accepting more candidates from that group, which might slightly reduce accuracy but improves fairness.
Good: Similar true positive rates and false positive rates across groups (e.g., TPR difference < 0.05). This means the model treats groups equally.
Bad: Large differences in metrics (e.g., one group has TPR 0.9, another 0.5). This means the model favors one group and is unfair.
- Ignoring context: Fairness depends on the problem and groups; one metric does not fit all.
- Data imbalance: Small group sizes can make metrics unstable or misleading.
- Accuracy paradox: A model can be accurate but unfair if it ignores minority groups.
- Overfitting fairness: Adjusting too much for fairness on training data can hurt real-world performance.
- Ignoring multiple fairness aspects: Focusing on one metric may hide unfairness in others.
Your model has 95% accuracy overall but the true positive rate for a minority group is 40%, while for the majority group it is 85%. Is this model good for fairness? Why or why not?
Answer: No, this model is not fair. Even though accuracy is high, the minority group has a much lower true positive rate, meaning they get fewer correct positive predictions. This shows bias and unfair treatment.