Bias detection and fairness metrics in MLOps - Time & Space Complexity
When checking for bias and fairness in machine learning models, we run calculations on data groups to measure fairness. Understanding how long these calculations take helps us plan and scale our work.
We want to know: how does the time to compute fairness metrics grow as the data size grows?
Analyze the time complexity of the following code snippet.
# Assume data is a list of records with sensitive attribute and prediction
sensitive_groups = set(record['group'] for record in data)
for group in sensitive_groups:
group_data = [r for r in data if r['group'] == group]
positive_count = sum(1 for r in group_data if r['prediction'] == 1)
total_count = len(group_data)
fairness_metric = positive_count / total_count
print(f"Group {group}: fairness metric = {fairness_metric}")
This code calculates a fairness metric for each sensitive group by filtering data and counting positive predictions.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping over each sensitive group and filtering the entire dataset for that group.
- How many times: For each group, the entire dataset is scanned once to filter records.
As the dataset grows, the filtering step repeats for each group, scanning all data each time.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Number of groups x 10 scans |
| 100 | Number of groups x 100 scans |
| 1000 | Number of groups x 1000 scans |
Pattern observation: The total work grows roughly by the number of groups times the data size, so it grows faster as data or groups increase.
Time Complexity: O(g x n)
This means the time to compute fairness metrics grows proportionally with both the number of groups and the size of the data.
[X] Wrong: "Filtering data for each group is fast because groups are few, so it doesn't affect time much."
[OK] Correct: Even a few groups cause repeated full scans of the data, so time grows with data size multiplied by groups, which can be costly.
Understanding how fairness metric calculations scale helps you design efficient checks in real projects. This skill shows you can think about both data and group sizes when working with fairness in machine learning.
"What if we pre-group the data once instead of filtering each time? How would the time complexity change?"