Built-in tests (unique, not_null, accepted_values, relationships) in dbt - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the time to run dbt's built-in tests changes as the data grows.
How does the test execution time grow when the input data size increases?
Analyze the time complexity of these dbt built-in tests.
-- Unique test example
select id from {{ ref('my_table') }} group by id having count(*) > 1
-- Not null test example
select id from {{ ref('my_table') }} where id is null
-- Accepted values test example
select id from {{ ref('my_table') }} where status not in ('active', 'inactive')
-- Relationships test example
select child.id from {{ ref('child_table') }} child
left join {{ ref('parent_table') }} parent on child.parent_id = parent.id
where parent.id is null
These tests check for duplicates, missing values, invalid values, and broken links between tables.
Look at what repeats as data grows.
- Primary operation: Scanning all rows in the table(s).
- How many times: Once per test, each row is checked or grouped.
As the number of rows grows, the work grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The time grows linearly with the number of rows.
Time Complexity: O(n)
This means the time to run these tests grows directly in proportion to the number of rows checked.
[X] Wrong: "These tests run instantly no matter how big the data is."
[OK] Correct: Each test must look at every row or group, so more data means more work and more time.
Understanding how test time grows helps you explain performance in real projects and shows you think about data scale practically.
"What if the accepted_values test checked against a list that grows with the data size? How would the time complexity change?"
Practice
unique test in dbt check for in a column?Solution
Step 1: Understand the purpose of the unique test
The unique test ensures that each value in the specified column appears only once, meaning no duplicates.Step 2: Compare with other test types
Other tests like not_null check for missing values, accepted_values check for allowed values, and relationships check for foreign key matches.Final Answer:
It checks that all values in the column are different with no duplicates. -> Option DQuick Check:
unique test = no duplicates [OK]
- Confusing unique with not_null test
- Thinking unique checks accepted values
- Mixing unique with relationships test
not_null test on the column user_id in a dbt model YAML file?Solution
Step 1: Recall YAML structure for dbt tests
Tests are added under the columns list, each column has a name and a tests list with test names.Step 2: Identify correct indentation and keys
columns: - name: user_id tests: - not_null correctly uses 'name' for the column and 'tests' as a list with '- not_null'. Other options have wrong keys or structure.Final Answer:
columns: - name: user_id tests: - not_null -> Option AQuick Check:
YAML tests under columns with name and tests list [OK]
- Using 'test' instead of 'tests'
- Incorrect indentation breaking YAML
- Placing tests outside columns section
columns:
- name: status
tests:
- accepted_values:
values: ['active', 'inactive', 'pending']
What happens if the status column contains the value 'deleted' when you run dbt test?Solution
Step 1: Understand accepted_values test behavior
The accepted_values test checks if all column values are within the specified list.Step 2: Check if 'deleted' is in the list
'deleted' is not in ['active', 'inactive', 'pending'], so the test will fail.Final Answer:
The test fails because 'deleted' is not in the accepted values list. -> Option BQuick Check:
accepted_values rejects values outside list [OK]
- Assuming test passes if value is a string
- Confusing accepted_values with not_null
- Thinking test skips unknown values
columns:
- name: order_id
tests:
- relationships:
to: ref('orders')
But running dbt test gives an error. What is the most likely cause?Solution
Step 1: Understand relationships test syntax
The relationships test requires both 'to' (target table) and 'field' (target column).Step 2: Identify the error cause
The YAML is missing the 'field' key, causing a configuration error when running dbt test.Final Answer:
The 'field' key is missing in the relationships test. -> Option AQuick Check:
relationships 'to' + 'field' required [OK]
- Using ref() in YAML instead of table name string
- Omitting the 'field' key
- Assuming 'field' must match column name
customer_id column in your orders model is unique, not null, and only contains values that exist in the customers table's id column. Which combination of built-in tests should you add in your YAML?Solution
Step 1: Identify tests for uniqueness and non-null
Use 'unique' to ensure no duplicates and 'not_null' to prevent missing values.Step 2: Ensure foreign key relationship
Use 'relationships' test with 'to' as 'customers' table and 'field' as 'id' to check existence.Step 3: Verify other options
Options B, C, and D misuse accepted_values or mix concepts incorrectly.Final Answer:
- unique - not_null - relationships: to: customers field: id -> Option CQuick Check:
unique + not_null + relationships = correct tests [OK]
- Using accepted_values to check null or uniqueness
- Misconfiguring relationships test
- Missing one of the required tests
