0
0
dbtdata~5 mins

Built-in tests (unique, not_null, accepted_values, relationships) in dbt - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Built-in tests (unique, not_null, accepted_values, relationships)
O(n)
Understanding Time Complexity

We want to understand how the time to run dbt's built-in tests changes as the data grows.

How does the test execution time grow when the input data size increases?

Scenario Under Consideration

Analyze the time complexity of these dbt built-in tests.


-- Unique test example
select id from {{ ref('my_table') }} group by id having count(*) > 1

-- Not null test example
select id from {{ ref('my_table') }} where id is null

-- Accepted values test example
select id from {{ ref('my_table') }} where status not in ('active', 'inactive')

-- Relationships test example
select child.id from {{ ref('child_table') }} child
left join {{ ref('parent_table') }} parent on child.parent_id = parent.id
where parent.id is null
    

These tests check for duplicates, missing values, invalid values, and broken links between tables.

Identify Repeating Operations

Look at what repeats as data grows.

  • Primary operation: Scanning all rows in the table(s).
  • How many times: Once per test, each row is checked or grouped.
How Execution Grows With Input

As the number of rows grows, the work grows roughly the same amount.

Input Size (n)Approx. Operations
10About 10 row checks
100About 100 row checks
1000About 1000 row checks

Pattern observation: The time grows linearly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to run these tests grows directly in proportion to the number of rows checked.

Common Mistake

[X] Wrong: "These tests run instantly no matter how big the data is."

[OK] Correct: Each test must look at every row or group, so more data means more work and more time.

Interview Connect

Understanding how test time grows helps you explain performance in real projects and shows you think about data scale practically.

Self-Check

"What if the accepted_values test checked against a list that grows with the data size? How would the time complexity change?"