Bird
Raised Fist0
dbtdata~5 mins

Built-in tests (unique, not_null, accepted_values, relationships) in dbt - Time & Space Complexity

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Time Complexity: Built-in tests (unique, not_null, accepted_values, relationships)
O(n)
Understanding Time Complexity

We want to understand how the time to run dbt's built-in tests changes as the data grows.

How does the test execution time grow when the input data size increases?

Scenario Under Consideration

Analyze the time complexity of these dbt built-in tests.


-- Unique test example
select id from {{ ref('my_table') }} group by id having count(*) > 1

-- Not null test example
select id from {{ ref('my_table') }} where id is null

-- Accepted values test example
select id from {{ ref('my_table') }} where status not in ('active', 'inactive')

-- Relationships test example
select child.id from {{ ref('child_table') }} child
left join {{ ref('parent_table') }} parent on child.parent_id = parent.id
where parent.id is null
    

These tests check for duplicates, missing values, invalid values, and broken links between tables.

Identify Repeating Operations

Look at what repeats as data grows.

  • Primary operation: Scanning all rows in the table(s).
  • How many times: Once per test, each row is checked or grouped.
How Execution Grows With Input

As the number of rows grows, the work grows roughly the same amount.

Input Size (n)Approx. Operations
10About 10 row checks
100About 100 row checks
1000About 1000 row checks

Pattern observation: The time grows linearly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to run these tests grows directly in proportion to the number of rows checked.

Common Mistake

[X] Wrong: "These tests run instantly no matter how big the data is."

[OK] Correct: Each test must look at every row or group, so more data means more work and more time.

Interview Connect

Understanding how test time grows helps you explain performance in real projects and shows you think about data scale practically.

Self-Check

"What if the accepted_values test checked against a list that grows with the data size? How would the time complexity change?"

Practice

(1/5)
1. What does the built-in unique test in dbt check for in a column?
easy
A. It checks that the column has no missing (null) values.
B. It checks that the column values exist in another table's column.
C. It checks that the column values match a predefined list of accepted values.
D. It checks that all values in the column are different with no duplicates.

Solution

  1. Step 1: Understand the purpose of the unique test

    The unique test ensures that each value in the specified column appears only once, meaning no duplicates.
  2. Step 2: Compare with other test types

    Other tests like not_null check for missing values, accepted_values check for allowed values, and relationships check for foreign key matches.
  3. Final Answer:

    It checks that all values in the column are different with no duplicates. -> Option D
  4. Quick Check:

    unique test = no duplicates [OK]
Hint: Unique means no duplicates allowed in the column [OK]
Common Mistakes:
  • Confusing unique with not_null test
  • Thinking unique checks accepted values
  • Mixing unique with relationships test
2. Which of the following is the correct syntax to add a not_null test on the column user_id in a dbt model YAML file?
easy
A. columns: - name: user_id tests: - not_null
B. columns: - user_id: tests: - not_null
C. tests: - not_null: user_id
D. columns: - name: user_id test: not_null

Solution

  1. Step 1: Recall YAML structure for dbt tests

    Tests are added under the columns list, each column has a name and a tests list with test names.
  2. Step 2: Identify correct indentation and keys

    columns: - name: user_id tests: - not_null correctly uses 'name' for the column and 'tests' as a list with '- not_null'. Other options have wrong keys or structure.
  3. Final Answer:

    columns: - name: user_id tests: - not_null -> Option A
  4. Quick Check:

    YAML tests under columns with name and tests list [OK]
Hint: Use 'name' and 'tests' keys with proper indentation [OK]
Common Mistakes:
  • Using 'test' instead of 'tests'
  • Incorrect indentation breaking YAML
  • Placing tests outside columns section
3. Given this YAML snippet in a dbt model:
columns:
  - name: status
    tests:
      - accepted_values:
          values: ['active', 'inactive', 'pending']
What happens if the status column contains the value 'deleted' when you run dbt test?
medium
A. The test passes because 'deleted' is a valid string.
B. The test fails because 'deleted' is not in the accepted values list.
C. The test is skipped because accepted_values only checks for nulls.
D. The test throws a syntax error due to incorrect YAML.

Solution

  1. Step 1: Understand accepted_values test behavior

    The accepted_values test checks if all column values are within the specified list.
  2. Step 2: Check if 'deleted' is in the list

    'deleted' is not in ['active', 'inactive', 'pending'], so the test will fail.
  3. Final Answer:

    The test fails because 'deleted' is not in the accepted values list. -> Option B
  4. Quick Check:

    accepted_values rejects values outside list [OK]
Hint: Accepted_values fails if any value is outside the list [OK]
Common Mistakes:
  • Assuming test passes if value is a string
  • Confusing accepted_values with not_null
  • Thinking test skips unknown values
4. You wrote this test in your dbt model YAML:
columns:
  - name: order_id
    tests:
      - relationships:
          to: ref('orders')
But running dbt test gives an error. What is the most likely cause?
medium
A. The 'field' key is missing in the relationships test.
B. The 'to' value should be a string, not a ref function.
C. The relationships test requires the 'field' to be the same as the column name.
D. The 'to' value must be a table name string, not a ref function.

Solution

  1. Step 1: Understand relationships test syntax

    The relationships test requires both 'to' (target table) and 'field' (target column).
  2. Step 2: Identify the error cause

    The YAML is missing the 'field' key, causing a configuration error when running dbt test.
  3. Final Answer:

    The 'field' key is missing in the relationships test. -> Option A
  4. Quick Check:

    relationships 'to' + 'field' required [OK]
Hint: relationships test requires 'to' and 'field' keys [OK]
Common Mistakes:
  • Using ref() in YAML instead of table name string
  • Omitting the 'field' key
  • Assuming 'field' must match column name
5. You want to ensure that the customer_id column in your orders model is unique, not null, and only contains values that exist in the customers table's id column. Which combination of built-in tests should you add in your YAML?
hard
A. - not_null - accepted_values: values: [unique] - relationships: to: customers field: id
B. - unique - accepted_values: values: [not null] - relationships: to: customers field: id
C. - unique - not_null - relationships: to: customers field: id
D. - unique - not_null - accepted_values: values: [customer_id]

Solution

  1. Step 1: Identify tests for uniqueness and non-null

    Use 'unique' to ensure no duplicates and 'not_null' to prevent missing values.
  2. Step 2: Ensure foreign key relationship

    Use 'relationships' test with 'to' as 'customers' table and 'field' as 'id' to check existence.
  3. Step 3: Verify other options

    Options B, C, and D misuse accepted_values or mix concepts incorrectly.
  4. Final Answer:

    - unique - not_null - relationships: to: customers field: id -> Option C
  5. Quick Check:

    unique + not_null + relationships = correct tests [OK]
Hint: Combine unique, not_null, and relationships for full check [OK]
Common Mistakes:
  • Using accepted_values to check null or uniqueness
  • Misconfiguring relationships test
  • Missing one of the required tests