0
0
dbtdata~15 mins

Unit testing dbt models - Deep Dive

Choose your learning style9 modes available
Overview - Unit testing dbt models
What is it?
Unit testing dbt models means checking small parts of your data transformation code to make sure they work correctly. In dbt, models are SQL files that create tables or views in your database. Unit tests help catch mistakes early by verifying that each model produces the expected results or meets certain rules. This makes your data pipeline more reliable and easier to maintain.
Why it matters
Without unit testing, errors in data models can go unnoticed until they cause bigger problems downstream, like wrong reports or bad decisions. Unit tests save time and effort by finding bugs early, improving trust in your data. They also make it easier to change or add new models safely, because you can quickly check if anything breaks. In real life, this means better data quality and faster development.
Where it fits
Before learning unit testing dbt models, you should understand basic dbt concepts like models, sources, and how dbt runs SQL transformations. After mastering unit testing, you can explore integration testing, data quality testing, and advanced dbt features like snapshots and macros. Unit testing is a key step in building robust data pipelines.
Mental Model
Core Idea
Unit testing dbt models is like checking each small building block of your data pipeline to ensure it works perfectly before stacking them together.
Think of it like...
Imagine building a LEGO castle. Before connecting many pieces, you check each LEGO block to make sure it’s not broken or misshaped. This way, the whole castle will be strong and stable.
┌───────────────┐
│  Raw Data     │
└──────┬────────┘
       │
┌──────▼────────┐
│  dbt Model 1  │  <-- Unit test checks this model's output
└──────┬────────┘
       │
┌──────▼────────┐
│  dbt Model 2  │  <-- Unit test checks this model's output
└──────┬────────┘
       │
┌──────▼────────┐
│ Final Dataset │
└───────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding dbt Models Basics
🤔
Concept: Learn what dbt models are and how they transform raw data using SQL.
dbt models are SQL files that create tables or views in your database. Each model runs a SELECT query to transform data. For example, a model might filter rows or calculate new columns. Running dbt builds these models in order, creating a data pipeline.
Result
You can write and run simple dbt models that create tables with transformed data.
Knowing what a dbt model is helps you see what you need to test: the SQL logic that shapes your data.
2
FoundationWhat is Unit Testing in dbt?
🤔
Concept: Unit testing means checking small parts of your dbt models to confirm they behave as expected.
In dbt, unit tests check if a model's output meets certain conditions, like no null values in a column or unique IDs. These tests run automatically and alert you if something is wrong. dbt supports tests written as SQL queries or built-in test types.
Result
You understand that unit tests are small checks on your data models to catch errors early.
Seeing unit tests as small checks on models makes testing manageable and focused.
3
IntermediateWriting Basic dbt Unit Tests
🤔Before reading on: do you think dbt tests are written in Python, SQL, or another language? Commit to your answer.
Concept: Learn how to write simple tests using dbt's built-in test types and custom SQL tests.
dbt has built-in tests like 'unique', 'not_null', and 'accepted_values' that you add in your model's schema.yml file. For example, to check a column has no nulls, you write: models: - name: my_model columns: - name: id tests: - not_null - unique You can also write custom SQL tests as separate files that return rows when the test fails.
Result
You can add tests to your dbt project that run automatically and check your data quality.
Knowing how to write tests in dbt's schema files makes testing part of your normal workflow.
4
IntermediateRunning and Interpreting dbt Test Results
🤔Before reading on: do you think dbt test stops the whole pipeline on failure or reports errors but continues? Commit to your answer.
Concept: Learn how to run tests and understand their output to fix data issues.
You run tests with the command 'dbt test'. dbt runs all tests and shows which passed or failed. If a test fails, dbt shows the rows causing the failure. This helps you find and fix data problems quickly. Tests do not stop the whole pipeline but alert you to issues.
Result
You can confidently run tests and use the results to improve your data models.
Understanding test results helps you quickly identify and fix data quality problems.
5
AdvancedCreating Custom SQL Unit Tests
🤔Before reading on: do you think custom tests in dbt return rows when passing or failing? Commit to your answer.
Concept: Learn to write your own SQL queries as tests for complex conditions not covered by built-in tests.
Custom tests are SQL files in the 'tests' folder. They return rows when the test fails. For example, to check no orders have negative amounts: SELECT * FROM {{ ref('orders') }} WHERE amount < 0 If this query returns rows, the test fails. You can parameterize tests to reuse them for different models or columns.
Result
You can write flexible tests tailored to your specific data rules.
Knowing custom tests lets you cover edge cases and complex logic beyond built-in tests.
6
AdvancedMocking Data for Isolated Unit Tests
🤔Before reading on: do you think dbt unit tests run only on production data or can use test data? Commit to your answer.
Concept: Learn how to create small test datasets to isolate and test models without relying on full production data.
You can create seed files or temporary models with sample data to test your transformations in isolation. This helps you verify logic without waiting for full data loads. For example, create a CSV seed with test rows and run your model against it. This approach speeds up testing and debugging.
Result
You can test models quickly and safely with controlled data inputs.
Using mock data isolates tests from external changes, making debugging easier and faster.
7
ExpertAutomating Unit Tests in CI/CD Pipelines
🤔Before reading on: do you think dbt tests can run automatically on every code change? Commit to your answer.
Concept: Learn how to integrate dbt unit tests into automated workflows for continuous testing and deployment.
In professional projects, dbt tests run automatically in CI/CD systems like GitHub Actions or Jenkins. When you push code, the pipeline runs 'dbt run' and 'dbt test' commands. Failures block deployment, ensuring only tested code reaches production. This automation improves reliability and developer confidence.
Result
Your dbt models are continuously tested, reducing bugs and manual work.
Automating tests enforces discipline and quality in data projects, preventing costly errors.
Under the Hood
dbt tests run SQL queries against your database. Built-in tests generate SQL that checks conditions like uniqueness or nulls. Custom tests are raw SQL queries you write. When tests run, dbt executes these queries and checks if any rows are returned. If rows exist, the test fails, signaling data issues. Tests run in the same environment as models, ensuring consistency.
Why designed this way?
dbt uses SQL for tests because it works directly with the data where it lives, avoiding extra layers or languages. This design leverages database power and keeps tests simple and fast. Alternatives like external scripts would add complexity and reduce transparency. Using SQL also fits naturally with dbt's core purpose: transforming data with SQL.
┌───────────────┐
│  dbt Test Run │
└──────┬────────┘
       │
┌──────▼────────┐
│ Generate SQL  │  <-- built-in or custom test SQL
└──────┬────────┘
       │
┌──────▼────────┐
│ Execute Query │  <-- runs on database
└──────┬────────┘
       │
┌──────▼────────┐
│ Check Results │  <-- rows returned = test fail
└──────┬────────┘
       │
┌──────▼────────┐
│ Report Status │  <-- pass or fail
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do dbt unit tests check the entire data pipeline or just individual models? Commit to your answer.
Common Belief:Unit tests in dbt check the whole data pipeline end-to-end.
Tap to reveal reality
Reality:Unit tests focus on individual models or small parts, not the entire pipeline. End-to-end testing is a separate practice.
Why it matters:Confusing unit tests with full pipeline tests can lead to missed errors in how models connect or integrate.
Quick: Do dbt tests stop your data pipeline if they fail? Commit to your answer.
Common Belief:If a dbt test fails, the whole dbt run stops immediately.
Tap to reveal reality
Reality:dbt runs all tests and reports failures but does not stop the entire pipeline automatically.
Why it matters:Expecting immediate stops may cause false confidence or confusion about test coverage.
Quick: Are dbt tests written only in Python? Commit to your answer.
Common Belief:dbt tests require Python coding skills.
Tap to reveal reality
Reality:dbt tests are written in SQL or YAML configurations, not Python.
Why it matters:Thinking Python is needed can discourage non-programmers from writing tests.
Quick: Do custom dbt tests pass when they return rows? Commit to your answer.
Common Belief:If a custom test query returns rows, the test passes.
Tap to reveal reality
Reality:Custom tests fail when rows are returned; no rows means pass.
Why it matters:Misunderstanding this reverses test logic, causing false positives or negatives.
Expert Zone
1
Some dbt tests can be slow on large datasets; experts optimize tests by limiting data scanned or using indexes.
2
Tests can be parameterized and reused across models, reducing duplication and maintenance effort.
3
Understanding database transaction isolation helps avoid flaky test results caused by concurrent data changes.
When NOT to use
Unit testing is not enough for validating data freshness, schema changes, or integration between systems. For those, use integration tests, data quality tools like Great Expectations, or monitoring solutions.
Production Patterns
In production, teams combine unit tests with CI/CD pipelines, schedule nightly full test runs, and use test results to trigger alerts. Tests are part of a broader data quality framework including documentation and lineage.
Connections
Software Unit Testing
dbt unit testing applies the same principle of testing small code units to data transformations.
Understanding software unit testing helps grasp why testing small parts of data pipelines improves reliability and maintainability.
Database Constraints
dbt tests often check conditions similar to database constraints like uniqueness or not null.
Knowing database constraints clarifies what dbt tests enforce and how they complement database integrity rules.
Quality Control in Manufacturing
Unit testing dbt models is like quality checks on parts before assembly in manufacturing.
Seeing data tests as quality control highlights the importance of catching defects early to avoid costly failures later.
Common Pitfalls
#1Writing tests that check too much data at once, causing slow test runs.
Wrong approach:SELECT * FROM big_table WHERE some_column IS NULL;
Correct approach:SELECT * FROM big_table WHERE some_column IS NULL LIMIT 100;
Root cause:Not limiting data scanned in tests leads to performance issues and slow feedback.
#2Assuming tests run automatically without adding them to schema.yml or test folders.
Wrong approach:-- Writing a test SQL file but not referencing it in dbt project -- or not adding built-in tests in schema.yml -- and expecting tests to run.
Correct approach:Add tests in schema.yml under model columns or place custom test SQL files in tests/ folder and reference them properly.
Root cause:Misunderstanding how dbt discovers and runs tests causes tests to be skipped silently.
#3Writing custom tests that return no rows on failure instead of on success.
Wrong approach:SELECT * FROM orders WHERE amount >= 0; -- returns rows when data is good
Correct approach:SELECT * FROM orders WHERE amount < 0; -- returns rows only when data is bad
Root cause:Confusing test logic direction reverses pass/fail meaning, causing false confidence.
Key Takeaways
Unit testing dbt models means checking small parts of your data transformations to catch errors early.
dbt tests are written in SQL or YAML and run queries that return rows when tests fail.
Adding tests to your dbt project improves data quality and developer confidence.
Automating tests in CI/CD pipelines ensures continuous data reliability and faster feedback.
Understanding test logic and performance helps write effective and efficient tests.