Bird
Raised Fist0
dbtdata~3 mins

Why Built-in tests (unique, not_null, accepted_values, relationships) in dbt? - Purpose & Use Cases

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
The Big Idea

What if a simple test could save you hours of painful data cleanup?

The Scenario

Imagine you have a huge spreadsheet with thousands of rows of sales data. You want to make sure every order ID is unique, no customer ID is missing, and all product categories are valid. Doing this by scanning rows one by one or writing complex manual checks is exhausting and easy to mess up.

The Problem

Manually checking data quality is slow and error-prone. You might miss duplicates or forget to check some columns. It's hard to keep track of all rules, and fixing errors after analysis wastes time and causes wrong decisions.

The Solution

Built-in tests in dbt let you quickly and reliably check your data with simple commands. They automatically verify uniqueness, missing values, allowed values, and relationships between tables. This saves time, reduces mistakes, and keeps your data trustworthy.

Before vs After
Before
SELECT order_id, COUNT(*) FROM sales GROUP BY order_id HAVING COUNT(*) > 1;
After
tests:
  - unique:
      column_name: order_id
What It Enables

With built-in tests, you can confidently trust your data and focus on insights instead of hunting errors.

Real Life Example

A retail company uses dbt built-in tests to ensure every transaction has a unique ID, no customer info is missing, and product categories match the approved list before running sales reports.

Key Takeaways

Manual data checks are slow and risky.

Built-in tests automate common data validations.

This leads to faster, more reliable data analysis.

Practice

(1/5)
1. What does the built-in unique test in dbt check for in a column?
easy
A. It checks that the column has no missing (null) values.
B. It checks that the column values exist in another table's column.
C. It checks that the column values match a predefined list of accepted values.
D. It checks that all values in the column are different with no duplicates.

Solution

  1. Step 1: Understand the purpose of the unique test

    The unique test ensures that each value in the specified column appears only once, meaning no duplicates.
  2. Step 2: Compare with other test types

    Other tests like not_null check for missing values, accepted_values check for allowed values, and relationships check for foreign key matches.
  3. Final Answer:

    It checks that all values in the column are different with no duplicates. -> Option D
  4. Quick Check:

    unique test = no duplicates [OK]
Hint: Unique means no duplicates allowed in the column [OK]
Common Mistakes:
  • Confusing unique with not_null test
  • Thinking unique checks accepted values
  • Mixing unique with relationships test
2. Which of the following is the correct syntax to add a not_null test on the column user_id in a dbt model YAML file?
easy
A. columns: - name: user_id tests: - not_null
B. columns: - user_id: tests: - not_null
C. tests: - not_null: user_id
D. columns: - name: user_id test: not_null

Solution

  1. Step 1: Recall YAML structure for dbt tests

    Tests are added under the columns list, each column has a name and a tests list with test names.
  2. Step 2: Identify correct indentation and keys

    columns: - name: user_id tests: - not_null correctly uses 'name' for the column and 'tests' as a list with '- not_null'. Other options have wrong keys or structure.
  3. Final Answer:

    columns: - name: user_id tests: - not_null -> Option A
  4. Quick Check:

    YAML tests under columns with name and tests list [OK]
Hint: Use 'name' and 'tests' keys with proper indentation [OK]
Common Mistakes:
  • Using 'test' instead of 'tests'
  • Incorrect indentation breaking YAML
  • Placing tests outside columns section
3. Given this YAML snippet in a dbt model:
columns:
  - name: status
    tests:
      - accepted_values:
          values: ['active', 'inactive', 'pending']
What happens if the status column contains the value 'deleted' when you run dbt test?
medium
A. The test passes because 'deleted' is a valid string.
B. The test fails because 'deleted' is not in the accepted values list.
C. The test is skipped because accepted_values only checks for nulls.
D. The test throws a syntax error due to incorrect YAML.

Solution

  1. Step 1: Understand accepted_values test behavior

    The accepted_values test checks if all column values are within the specified list.
  2. Step 2: Check if 'deleted' is in the list

    'deleted' is not in ['active', 'inactive', 'pending'], so the test will fail.
  3. Final Answer:

    The test fails because 'deleted' is not in the accepted values list. -> Option B
  4. Quick Check:

    accepted_values rejects values outside list [OK]
Hint: Accepted_values fails if any value is outside the list [OK]
Common Mistakes:
  • Assuming test passes if value is a string
  • Confusing accepted_values with not_null
  • Thinking test skips unknown values
4. You wrote this test in your dbt model YAML:
columns:
  - name: order_id
    tests:
      - relationships:
          to: ref('orders')
But running dbt test gives an error. What is the most likely cause?
medium
A. The 'field' key is missing in the relationships test.
B. The 'to' value should be a string, not a ref function.
C. The relationships test requires the 'field' to be the same as the column name.
D. The 'to' value must be a table name string, not a ref function.

Solution

  1. Step 1: Understand relationships test syntax

    The relationships test requires both 'to' (target table) and 'field' (target column).
  2. Step 2: Identify the error cause

    The YAML is missing the 'field' key, causing a configuration error when running dbt test.
  3. Final Answer:

    The 'field' key is missing in the relationships test. -> Option A
  4. Quick Check:

    relationships 'to' + 'field' required [OK]
Hint: relationships test requires 'to' and 'field' keys [OK]
Common Mistakes:
  • Using ref() in YAML instead of table name string
  • Omitting the 'field' key
  • Assuming 'field' must match column name
5. You want to ensure that the customer_id column in your orders model is unique, not null, and only contains values that exist in the customers table's id column. Which combination of built-in tests should you add in your YAML?
hard
A. - not_null - accepted_values: values: [unique] - relationships: to: customers field: id
B. - unique - accepted_values: values: [not null] - relationships: to: customers field: id
C. - unique - not_null - relationships: to: customers field: id
D. - unique - not_null - accepted_values: values: [customer_id]

Solution

  1. Step 1: Identify tests for uniqueness and non-null

    Use 'unique' to ensure no duplicates and 'not_null' to prevent missing values.
  2. Step 2: Ensure foreign key relationship

    Use 'relationships' test with 'to' as 'customers' table and 'field' as 'id' to check existence.
  3. Step 3: Verify other options

    Options B, C, and D misuse accepted_values or mix concepts incorrectly.
  4. Final Answer:

    - unique - not_null - relationships: to: customers field: id -> Option C
  5. Quick Check:

    unique + not_null + relationships = correct tests [OK]
Hint: Combine unique, not_null, and relationships for full check [OK]
Common Mistakes:
  • Using accepted_values to check null or uniqueness
  • Misconfiguring relationships test
  • Missing one of the required tests