Why governance ensures data trust in dbt - Performance Analysis
We want to understand how the time it takes to check data governance rules grows as data grows.
How does enforcing governance affect the work done on data in dbt?
Analyze the time complexity of the following dbt model with governance checks.
-- model.sql
select
user_id,
count(*) as total_events
from {{ ref('events') }}
where event_date >= '2024-01-01'
and event_date <= '2024-01-31'
and is_valid = true -- governance filter
group by user_id
This code filters events by date and a governance flag, then counts events per user.
- Primary operation: Scanning each event row to check date and validity.
- How many times: Once for every event in the input table.
As the number of events grows, the work to check each event grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The work grows directly with the number of events.
Time Complexity: O(n)
This means the time to enforce governance rules grows linearly with data size.
[X] Wrong: "Governance checks only add a tiny fixed cost, so time stays the same no matter data size."
[OK] Correct: Each row must be checked, so more data means more work, not a fixed cost.
Understanding how governance affects processing time helps you explain data quality work clearly and confidently.
"What if we added multiple governance filters instead of one? How would the time complexity change?"