Bird
Raised Fist0
dbtdata~15 mins

Built-in tests (unique, not_null, accepted_values, relationships) in dbt - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Built-in tests (unique, not_null, accepted_values, relationships)
What is it?
Built-in tests in dbt are pre-made checks that help you find problems in your data automatically. They check if values are unique, not missing, within allowed options, or if relationships between tables are correct. These tests run every time you build your data models to keep data trustworthy. They save time by catching errors early without writing complex code.
Why it matters
Without these tests, bad data can silently cause wrong decisions, wasted effort, and lost trust. Built-in tests make it easy to spot data issues before they reach reports or dashboards. This means teams can rely on data confidently and fix problems quickly, improving overall data quality and business outcomes.
Where it fits
Before using built-in tests, you should understand basic SQL and how dbt models work. After mastering these tests, you can learn custom tests and advanced data quality frameworks. This topic fits in the data validation and quality assurance part of the data engineering and analytics workflow.
Mental Model
Core Idea
Built-in tests are automatic data quality checks that verify key rules about your data to keep it accurate and reliable.
Think of it like...
Imagine a factory quality inspector who checks every product for defects like missing parts, duplicates, or wrong sizes before shipping. Built-in tests do the same for your data, catching problems early.
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│   Data Table  │─────▶│ Built-in Test │─────▶│ Test Result   │
│ (your data)   │      │ (unique,      │      │ (pass/fail)   │
│               │      │  not_null,    │      │               │
│               │      │  accepted_    │      │               │
│               │      │  values,      │      │               │
│               │      │  relationships)│      │               │
└───────────────┘      └───────────────┘      └───────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding Unique Test Basics
🤔
Concept: The unique test checks if all values in a column are different, with no repeats.
In dbt, the unique test ensures that a column meant to identify records uniquely, like an ID, has no duplicates. You add it by specifying the column in your model's schema file under tests: unique. When run, dbt queries the data to find any repeated values.
Result
If duplicates exist, the test fails and shows which values repeat. If all values are unique, the test passes.
Understanding uniqueness is key because duplicate IDs or keys can cause wrong joins and data confusion downstream.
2
FoundationGrasping Not Null Test Purpose
🤔
Concept: The not_null test checks that a column has no missing (null) values.
Some columns must always have a value, like user email or order date. The not_null test in dbt checks these columns for any nulls. You add it similarly in the schema file under tests: not_null. dbt runs a query to find any rows missing values in that column.
Result
If any nulls are found, the test fails and lists the problem rows. If none, it passes.
Knowing which columns cannot be empty helps prevent errors in calculations or lookups that assume data is complete.
3
IntermediateUsing Accepted Values Test
🤔Before reading on: do you think accepted_values test only checks for exact matches or can it handle ranges? Commit to your answer.
Concept: The accepted_values test checks if column values are only from a specific allowed list.
This test is useful for columns with limited valid options, like status or category. You define the allowed values in the schema file under tests: accepted_values with a list. dbt then finds any values outside this list.
Result
If invalid values exist, the test fails and shows them. If all values are allowed, it passes.
Knowing how to restrict values prevents data entry errors and keeps categories consistent for analysis.
4
IntermediateExploring Relationships Test
🤔Before reading on: do you think relationships test checks if values exist in another table or if they match exactly? Commit to your answer.
Concept: The relationships test checks if values in one table appear in another, enforcing foreign key rules.
This test ensures referential integrity, like making sure every order's customer ID exists in the customers table. You specify the column and the related model and column in the schema file under tests: relationships. dbt queries for values missing in the related table.
Result
If unmatched values are found, the test fails and lists them. If all values match, it passes.
Understanding relationships tests helps maintain data consistency across tables, avoiding orphan records.
5
AdvancedCombining Multiple Built-in Tests
🤔Before reading on: do you think running multiple tests together slows down dbt significantly or is it efficient? Commit to your answer.
Concept: You can apply several built-in tests on the same or different columns to cover multiple data quality rules at once.
In your schema file, you list multiple tests under each column or model. dbt runs all tests in one build, reporting each test's pass or fail status separately. This helps catch different types of data issues in one go.
Result
You get a detailed report showing which tests passed or failed per column, enabling focused fixes.
Knowing how to combine tests efficiently helps build robust data validation without extra coding.
6
ExpertOptimizing Built-in Tests for Large Data
🤔Before reading on: do you think built-in tests always scan entire tables or can they be optimized? Commit to your answer.
Concept: Built-in tests run SQL queries that can be expensive on big tables; optimizing them improves performance.
You can optimize tests by limiting scope with filters, using incremental models, or running tests on sampled data. Also, understanding how dbt compiles tests into SQL helps you customize queries for speed. Some databases support indexes or constraints that speed up tests.
Result
Tests run faster and use fewer resources, making them practical for production pipelines.
Understanding test optimization prevents slow builds and resource waste in real-world data projects.
Under the Hood
Built-in tests in dbt are SQL queries generated from simple YAML configurations. When you run dbt test, it compiles these configurations into SQL that checks conditions like duplicates, nulls, or foreign key matches. The database executes these queries and returns results indicating pass or fail. dbt then summarizes these results for you.
Why designed this way?
dbt uses SQL because it works directly with your data warehouse, leveraging its power and avoiding data movement. The YAML config makes tests easy to write and maintain without complex coding. This design balances simplicity for users and efficiency in execution.
┌───────────────┐
│ YAML Test     │
│ Configuration │
└──────┬────────┘
       │ compiles to
┌──────▼────────┐
│ SQL Query     │
│ (test logic)  │
└──────┬────────┘
       │ runs on
┌──────▼────────┐
│ Data Warehouse│
│ (your data)   │
└──────┬────────┘
       │ returns
┌──────▼────────┐
│ Test Results  │
│ (pass/fail)   │
└───────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Do you think the unique test checks for null values as duplicates? Commit yes or no.
Common Belief:The unique test treats null values as duplicates and fails if multiple nulls exist.
Tap to reveal reality
Reality:Most databases treat nulls as distinct, so multiple nulls do not cause unique test failure.
Why it matters:Assuming nulls cause unique test failure can lead to unnecessary data fixes or confusion about test results.
Quick: Do you think accepted_values test can check numeric ranges? Commit yes or no.
Common Belief:Accepted_values test can validate numeric ranges like 'between 1 and 10'.
Tap to reveal reality
Reality:Accepted_values only checks if values exactly match a list; it cannot check ranges.
Why it matters:Expecting range checks from accepted_values leads to missed invalid data unless custom tests are used.
Quick: Do you think relationships test ensures referential integrity like database foreign keys? Commit yes or no.
Common Belief:Relationships test enforces the same strict referential integrity as database foreign keys.
Tap to reveal reality
Reality:Relationships test only checks data at test time; it does not enforce constraints continuously like foreign keys.
Why it matters:Relying solely on relationships tests without database constraints can allow bad data between test runs.
Expert Zone
1
Built-in tests generate SQL that can behave differently depending on your database's null handling and indexing, affecting test results and performance.
2
Tests run in the context of your dbt model's compiled SQL, so changes in model logic can affect test outcomes unexpectedly.
3
Combining tests with custom SQL or macros allows flexible validation beyond built-in capabilities, but requires careful design to maintain clarity.
When NOT to use
Built-in tests are limited to simple checks and may not suit complex validation logic like pattern matching or cross-dataset consistency. In such cases, use custom tests or external data quality tools like Great Expectations or Deequ.
Production Patterns
Teams integrate built-in tests into CI/CD pipelines to automatically validate data on every change. They combine tests with alerting systems to notify data engineers immediately on failures, enabling fast issue resolution and maintaining trust.
Connections
Data Validation
Built-in tests are a practical implementation of data validation principles.
Understanding general data validation helps grasp why built-in tests focus on uniqueness, completeness, and correctness.
Database Constraints
Built-in tests mimic some database constraints but run as queries rather than enforced rules.
Knowing database constraints clarifies the limits of built-in tests and when to rely on database-level enforcement.
Quality Control in Manufacturing
Built-in tests serve a similar role as quality control checks in factories, ensuring products meet standards before shipping.
Seeing data tests as quality control highlights their role in preventing defects and maintaining trust.
Common Pitfalls
#1Applying unique test on columns that allow nulls without understanding null behavior.
Wrong approach:tests: - unique: column_name: user_id where: user_id IS NOT NULL
Correct approach:tests: - unique: column_name: user_id
Root cause:Misunderstanding that unique test treats nulls as distinct, so filtering nulls is unnecessary and may hide issues.
#2Using accepted_values test to check numeric ranges instead of exact values.
Wrong approach:tests: - accepted_values: column_name: age values: [18, 19, 20, 21, 22, 23, 24, 25]
Correct approach:Use a custom test with SQL condition: age BETWEEN 18 AND 25
Root cause:Confusing accepted_values as a range checker rather than a list membership test.
#3Assuming relationships test enforces referential integrity continuously.
Wrong approach:Relying only on dbt relationships test without database foreign keys.
Correct approach:Add foreign key constraints in the database alongside dbt relationships tests.
Root cause:Not realizing that dbt tests run on demand and do not prevent bad data insertion.
Key Takeaways
Built-in tests in dbt are simple, reusable checks that help keep your data accurate and trustworthy.
They cover common data quality rules like uniqueness, completeness, allowed values, and relationships between tables.
These tests run as SQL queries generated from easy YAML configurations, making them accessible without deep coding.
Understanding their behavior and limits helps you apply them effectively and avoid common mistakes.
Combining built-in tests with database constraints and custom tests creates a strong data quality framework in production.

Practice

(1/5)
1. What does the built-in unique test in dbt check for in a column?
easy
A. It checks that the column has no missing (null) values.
B. It checks that the column values exist in another table's column.
C. It checks that the column values match a predefined list of accepted values.
D. It checks that all values in the column are different with no duplicates.

Solution

  1. Step 1: Understand the purpose of the unique test

    The unique test ensures that each value in the specified column appears only once, meaning no duplicates.
  2. Step 2: Compare with other test types

    Other tests like not_null check for missing values, accepted_values check for allowed values, and relationships check for foreign key matches.
  3. Final Answer:

    It checks that all values in the column are different with no duplicates. -> Option D
  4. Quick Check:

    unique test = no duplicates [OK]
Hint: Unique means no duplicates allowed in the column [OK]
Common Mistakes:
  • Confusing unique with not_null test
  • Thinking unique checks accepted values
  • Mixing unique with relationships test
2. Which of the following is the correct syntax to add a not_null test on the column user_id in a dbt model YAML file?
easy
A. columns: - name: user_id tests: - not_null
B. columns: - user_id: tests: - not_null
C. tests: - not_null: user_id
D. columns: - name: user_id test: not_null

Solution

  1. Step 1: Recall YAML structure for dbt tests

    Tests are added under the columns list, each column has a name and a tests list with test names.
  2. Step 2: Identify correct indentation and keys

    columns: - name: user_id tests: - not_null correctly uses 'name' for the column and 'tests' as a list with '- not_null'. Other options have wrong keys or structure.
  3. Final Answer:

    columns: - name: user_id tests: - not_null -> Option A
  4. Quick Check:

    YAML tests under columns with name and tests list [OK]
Hint: Use 'name' and 'tests' keys with proper indentation [OK]
Common Mistakes:
  • Using 'test' instead of 'tests'
  • Incorrect indentation breaking YAML
  • Placing tests outside columns section
3. Given this YAML snippet in a dbt model:
columns:
  - name: status
    tests:
      - accepted_values:
          values: ['active', 'inactive', 'pending']
What happens if the status column contains the value 'deleted' when you run dbt test?
medium
A. The test passes because 'deleted' is a valid string.
B. The test fails because 'deleted' is not in the accepted values list.
C. The test is skipped because accepted_values only checks for nulls.
D. The test throws a syntax error due to incorrect YAML.

Solution

  1. Step 1: Understand accepted_values test behavior

    The accepted_values test checks if all column values are within the specified list.
  2. Step 2: Check if 'deleted' is in the list

    'deleted' is not in ['active', 'inactive', 'pending'], so the test will fail.
  3. Final Answer:

    The test fails because 'deleted' is not in the accepted values list. -> Option B
  4. Quick Check:

    accepted_values rejects values outside list [OK]
Hint: Accepted_values fails if any value is outside the list [OK]
Common Mistakes:
  • Assuming test passes if value is a string
  • Confusing accepted_values with not_null
  • Thinking test skips unknown values
4. You wrote this test in your dbt model YAML:
columns:
  - name: order_id
    tests:
      - relationships:
          to: ref('orders')
But running dbt test gives an error. What is the most likely cause?
medium
A. The 'field' key is missing in the relationships test.
B. The 'to' value should be a string, not a ref function.
C. The relationships test requires the 'field' to be the same as the column name.
D. The 'to' value must be a table name string, not a ref function.

Solution

  1. Step 1: Understand relationships test syntax

    The relationships test requires both 'to' (target table) and 'field' (target column).
  2. Step 2: Identify the error cause

    The YAML is missing the 'field' key, causing a configuration error when running dbt test.
  3. Final Answer:

    The 'field' key is missing in the relationships test. -> Option A
  4. Quick Check:

    relationships 'to' + 'field' required [OK]
Hint: relationships test requires 'to' and 'field' keys [OK]
Common Mistakes:
  • Using ref() in YAML instead of table name string
  • Omitting the 'field' key
  • Assuming 'field' must match column name
5. You want to ensure that the customer_id column in your orders model is unique, not null, and only contains values that exist in the customers table's id column. Which combination of built-in tests should you add in your YAML?
hard
A. - not_null - accepted_values: values: [unique] - relationships: to: customers field: id
B. - unique - accepted_values: values: [not null] - relationships: to: customers field: id
C. - unique - not_null - relationships: to: customers field: id
D. - unique - not_null - accepted_values: values: [customer_id]

Solution

  1. Step 1: Identify tests for uniqueness and non-null

    Use 'unique' to ensure no duplicates and 'not_null' to prevent missing values.
  2. Step 2: Ensure foreign key relationship

    Use 'relationships' test with 'to' as 'customers' table and 'field' as 'id' to check existence.
  3. Step 3: Verify other options

    Options B, C, and D misuse accepted_values or mix concepts incorrectly.
  4. Final Answer:

    - unique - not_null - relationships: to: customers field: id -> Option C
  5. Quick Check:

    unique + not_null + relationships = correct tests [OK]
Hint: Combine unique, not_null, and relationships for full check [OK]
Common Mistakes:
  • Using accepted_values to check null or uniqueness
  • Misconfiguring relationships test
  • Missing one of the required tests