0
0
dbtdata~15 mins

Built-in tests (unique, not_null, accepted_values, relationships) in dbt - Deep Dive

Choose your learning style9 modes available
Overview - Built-in tests (unique, not_null, accepted_values, relationships)
What is it?
Built-in tests in dbt are pre-made checks that help you find problems in your data automatically. They check if values are unique, not missing, within allowed options, or if relationships between tables are correct. These tests run every time you build your data models to keep data trustworthy. They save time by catching errors early without writing complex code.
Why it matters
Without these tests, bad data can silently cause wrong decisions, wasted effort, and lost trust. Built-in tests make it easy to spot data issues before they reach reports or dashboards. This means teams can rely on data confidently and fix problems quickly, improving overall data quality and business outcomes.
Where it fits
Before using built-in tests, you should understand basic SQL and how dbt models work. After mastering these tests, you can learn custom tests and advanced data quality frameworks. This topic fits in the data validation and quality assurance part of the data engineering and analytics workflow.
Mental Model
Core Idea
Built-in tests are automatic data quality checks that verify key rules about your data to keep it accurate and reliable.
Think of it like...
Imagine a factory quality inspector who checks every product for defects like missing parts, duplicates, or wrong sizes before shipping. Built-in tests do the same for your data, catching problems early.
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│   Data Table  │─────▶│ Built-in Test │─────▶│ Test Result   │
│ (your data)   │      │ (unique,      │      │ (pass/fail)   │
│               │      │  not_null,    │      │               │
│               │      │  accepted_    │      │               │
│               │      │  values,      │      │               │
│               │      │  relationships)│      │               │
└───────────────┘      └───────────────┘      └───────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding Unique Test Basics
🤔
Concept: The unique test checks if all values in a column are different, with no repeats.
In dbt, the unique test ensures that a column meant to identify records uniquely, like an ID, has no duplicates. You add it by specifying the column in your model's schema file under tests: unique. When run, dbt queries the data to find any repeated values.
Result
If duplicates exist, the test fails and shows which values repeat. If all values are unique, the test passes.
Understanding uniqueness is key because duplicate IDs or keys can cause wrong joins and data confusion downstream.
2
FoundationGrasping Not Null Test Purpose
🤔
Concept: The not_null test checks that a column has no missing (null) values.
Some columns must always have a value, like user email or order date. The not_null test in dbt checks these columns for any nulls. You add it similarly in the schema file under tests: not_null. dbt runs a query to find any rows missing values in that column.
Result
If any nulls are found, the test fails and lists the problem rows. If none, it passes.
Knowing which columns cannot be empty helps prevent errors in calculations or lookups that assume data is complete.
3
IntermediateUsing Accepted Values Test
🤔Before reading on: do you think accepted_values test only checks for exact matches or can it handle ranges? Commit to your answer.
Concept: The accepted_values test checks if column values are only from a specific allowed list.
This test is useful for columns with limited valid options, like status or category. You define the allowed values in the schema file under tests: accepted_values with a list. dbt then finds any values outside this list.
Result
If invalid values exist, the test fails and shows them. If all values are allowed, it passes.
Knowing how to restrict values prevents data entry errors and keeps categories consistent for analysis.
4
IntermediateExploring Relationships Test
🤔Before reading on: do you think relationships test checks if values exist in another table or if they match exactly? Commit to your answer.
Concept: The relationships test checks if values in one table appear in another, enforcing foreign key rules.
This test ensures referential integrity, like making sure every order's customer ID exists in the customers table. You specify the column and the related model and column in the schema file under tests: relationships. dbt queries for values missing in the related table.
Result
If unmatched values are found, the test fails and lists them. If all values match, it passes.
Understanding relationships tests helps maintain data consistency across tables, avoiding orphan records.
5
AdvancedCombining Multiple Built-in Tests
🤔Before reading on: do you think running multiple tests together slows down dbt significantly or is it efficient? Commit to your answer.
Concept: You can apply several built-in tests on the same or different columns to cover multiple data quality rules at once.
In your schema file, you list multiple tests under each column or model. dbt runs all tests in one build, reporting each test's pass or fail status separately. This helps catch different types of data issues in one go.
Result
You get a detailed report showing which tests passed or failed per column, enabling focused fixes.
Knowing how to combine tests efficiently helps build robust data validation without extra coding.
6
ExpertOptimizing Built-in Tests for Large Data
🤔Before reading on: do you think built-in tests always scan entire tables or can they be optimized? Commit to your answer.
Concept: Built-in tests run SQL queries that can be expensive on big tables; optimizing them improves performance.
You can optimize tests by limiting scope with filters, using incremental models, or running tests on sampled data. Also, understanding how dbt compiles tests into SQL helps you customize queries for speed. Some databases support indexes or constraints that speed up tests.
Result
Tests run faster and use fewer resources, making them practical for production pipelines.
Understanding test optimization prevents slow builds and resource waste in real-world data projects.
Under the Hood
Built-in tests in dbt are SQL queries generated from simple YAML configurations. When you run dbt test, it compiles these configurations into SQL that checks conditions like duplicates, nulls, or foreign key matches. The database executes these queries and returns results indicating pass or fail. dbt then summarizes these results for you.
Why designed this way?
dbt uses SQL because it works directly with your data warehouse, leveraging its power and avoiding data movement. The YAML config makes tests easy to write and maintain without complex coding. This design balances simplicity for users and efficiency in execution.
┌───────────────┐
│ YAML Test     │
│ Configuration │
└──────┬────────┘
       │ compiles to
┌──────▼────────┐
│ SQL Query     │
│ (test logic)  │
└──────┬────────┘
       │ runs on
┌──────▼────────┐
│ Data Warehouse│
│ (your data)   │
└──────┬────────┘
       │ returns
┌──────▼────────┐
│ Test Results  │
│ (pass/fail)   │
└───────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Do you think the unique test checks for null values as duplicates? Commit yes or no.
Common Belief:The unique test treats null values as duplicates and fails if multiple nulls exist.
Tap to reveal reality
Reality:Most databases treat nulls as distinct, so multiple nulls do not cause unique test failure.
Why it matters:Assuming nulls cause unique test failure can lead to unnecessary data fixes or confusion about test results.
Quick: Do you think accepted_values test can check numeric ranges? Commit yes or no.
Common Belief:Accepted_values test can validate numeric ranges like 'between 1 and 10'.
Tap to reveal reality
Reality:Accepted_values only checks if values exactly match a list; it cannot check ranges.
Why it matters:Expecting range checks from accepted_values leads to missed invalid data unless custom tests are used.
Quick: Do you think relationships test ensures referential integrity like database foreign keys? Commit yes or no.
Common Belief:Relationships test enforces the same strict referential integrity as database foreign keys.
Tap to reveal reality
Reality:Relationships test only checks data at test time; it does not enforce constraints continuously like foreign keys.
Why it matters:Relying solely on relationships tests without database constraints can allow bad data between test runs.
Expert Zone
1
Built-in tests generate SQL that can behave differently depending on your database's null handling and indexing, affecting test results and performance.
2
Tests run in the context of your dbt model's compiled SQL, so changes in model logic can affect test outcomes unexpectedly.
3
Combining tests with custom SQL or macros allows flexible validation beyond built-in capabilities, but requires careful design to maintain clarity.
When NOT to use
Built-in tests are limited to simple checks and may not suit complex validation logic like pattern matching or cross-dataset consistency. In such cases, use custom tests or external data quality tools like Great Expectations or Deequ.
Production Patterns
Teams integrate built-in tests into CI/CD pipelines to automatically validate data on every change. They combine tests with alerting systems to notify data engineers immediately on failures, enabling fast issue resolution and maintaining trust.
Connections
Data Validation
Built-in tests are a practical implementation of data validation principles.
Understanding general data validation helps grasp why built-in tests focus on uniqueness, completeness, and correctness.
Database Constraints
Built-in tests mimic some database constraints but run as queries rather than enforced rules.
Knowing database constraints clarifies the limits of built-in tests and when to rely on database-level enforcement.
Quality Control in Manufacturing
Built-in tests serve a similar role as quality control checks in factories, ensuring products meet standards before shipping.
Seeing data tests as quality control highlights their role in preventing defects and maintaining trust.
Common Pitfalls
#1Applying unique test on columns that allow nulls without understanding null behavior.
Wrong approach:tests: - unique: column_name: user_id where: user_id IS NOT NULL
Correct approach:tests: - unique: column_name: user_id
Root cause:Misunderstanding that unique test treats nulls as distinct, so filtering nulls is unnecessary and may hide issues.
#2Using accepted_values test to check numeric ranges instead of exact values.
Wrong approach:tests: - accepted_values: column_name: age values: [18, 19, 20, 21, 22, 23, 24, 25]
Correct approach:Use a custom test with SQL condition: age BETWEEN 18 AND 25
Root cause:Confusing accepted_values as a range checker rather than a list membership test.
#3Assuming relationships test enforces referential integrity continuously.
Wrong approach:Relying only on dbt relationships test without database foreign keys.
Correct approach:Add foreign key constraints in the database alongside dbt relationships tests.
Root cause:Not realizing that dbt tests run on demand and do not prevent bad data insertion.
Key Takeaways
Built-in tests in dbt are simple, reusable checks that help keep your data accurate and trustworthy.
They cover common data quality rules like uniqueness, completeness, allowed values, and relationships between tables.
These tests run as SQL queries generated from easy YAML configurations, making them accessible without deep coding.
Understanding their behavior and limits helps you apply them effectively and avoid common mistakes.
Combining built-in tests with database constraints and custom tests creates a strong data quality framework in production.