Bird
Raised Fist0
dbtdata~15 mins

Running tests with dbt test - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Running tests with dbt test
What is it?
Running tests with dbt test means checking your data models to make sure they are correct and reliable. dbt test runs predefined or custom tests on your data to find errors or unexpected values. This helps catch problems early before using the data for analysis or reports. It is like a safety check for your data pipelines.
Why it matters
Without running tests, data errors can go unnoticed and cause wrong decisions or wasted time fixing issues later. Tests help maintain trust in your data by automatically verifying assumptions and quality. This saves effort and improves confidence when sharing data insights with others. It also speeds up finding and fixing problems.
Where it fits
Before running dbt test, you should know how to build data models with dbt and write SQL queries. After learning tests, you can explore advanced testing strategies, continuous integration, and monitoring data quality in production pipelines.
Mental Model
Core Idea
Running tests with dbt test automatically checks your data models for correctness and quality before using them.
Think of it like...
It's like proofreading a document before sending it out to catch typos and mistakes that could confuse readers.
┌───────────────┐
│  dbt models   │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│  dbt test     │
│ (runs checks) │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Test results  │
│ (pass/fail)   │
└───────────────┘
Build-Up - 7 Steps
1
FoundationWhat is dbt test command
🤔
Concept: Introducing the dbt test command and its purpose.
The dbt test command runs tests defined in your dbt project. These tests check your data models for issues like missing values or duplicates. You run it in your terminal with 'dbt test'. It helps catch data problems early.
Result
dbt runs all tests and shows which pass or fail in the terminal.
Understanding the dbt test command is the first step to automating data quality checks.
2
FoundationTypes of tests in dbt
🤔
Concept: Learn about built-in and custom tests in dbt.
dbt has built-in tests like 'unique', 'not_null', and 'accepted_values' you can add to your models. You can also write custom SQL tests for specific rules. Tests are defined in YAML files or SQL files inside your project.
Result
You can specify tests to check different data quality rules easily.
Knowing test types helps you pick the right checks for your data needs.
3
IntermediateWriting schema tests in YAML
🤔Before reading on: do you think schema tests are written in SQL or YAML? Commit to your answer.
Concept: How to define tests in schema YAML files linked to models.
Schema tests are written in YAML files next to your models. You list columns and attach tests like unique or not_null. For example: models: - name: customers columns: - name: id tests: - unique - not_null This tells dbt to check the 'id' column for uniqueness and no missing values.
Result
dbt knows which tests to run on which columns when you run 'dbt test'.
Using YAML for tests keeps your checks organized and close to your data models.
4
IntermediateCreating custom SQL tests
🤔Before reading on: do you think custom tests must always check one column or can they check multiple columns? Commit to your answer.
Concept: Writing your own SQL queries as tests for complex rules.
Custom tests are SQL files that return rows when data fails the test. For example, a test to find customers with negative age: SELECT * FROM {{ ref('customers') }} WHERE age < 0 If this query returns rows, the test fails. You save this SQL in the 'tests' folder and reference it in your YAML.
Result
You can check any condition you want beyond built-in tests.
Custom SQL tests give you full flexibility to enforce business rules.
5
IntermediateInterpreting dbt test results
🤔Before reading on: do you think dbt test stops at first failure or reports all failures? Commit to your answer.
Concept: Understanding the output format and meaning of test results.
When you run 'dbt test', it runs all tests and shows a summary. It lists which tests passed and which failed, including how many rows failed. It does not stop at first failure but runs all tests to give a full report.
Result
You get a clear picture of data quality issues to fix.
Knowing how to read test results helps prioritize data fixes effectively.
6
AdvancedAutomating tests in CI/CD pipelines
🤔Before reading on: do you think running dbt test manually is enough for production data quality? Commit to your answer.
Concept: Integrating dbt tests into automated workflows for continuous quality checks.
In production, you automate 'dbt test' in CI/CD pipelines so tests run on every code change or data update. This catches errors early without manual effort. You configure your pipeline to run 'dbt test' after building models and fail if tests fail.
Result
Data quality is continuously monitored and enforced automatically.
Automation ensures consistent data quality and faster feedback loops.
7
ExpertHandling flaky tests and test performance
🤔Before reading on: do you think all dbt tests run equally fast and reliably? Commit to your answer.
Concept: Understanding challenges with test reliability and speed in large projects.
Some tests can be flaky if data changes rapidly or tests depend on timing. Complex custom tests may run slowly on big datasets. Experts optimize tests by limiting scope, using incremental models, or caching results. They also handle flaky tests by retrying or marking them as warnings.
Result
Tests remain reliable and efficient even in complex environments.
Knowing test limitations and optimizations prevents false alarms and slow pipelines.
Under the Hood
dbt test works by compiling your test definitions into SQL queries that run against your database. Built-in tests generate standard SQL checks like counting duplicates or nulls. Custom tests run your SQL directly. The results are collected and interpreted: if any rows are returned, the test fails. dbt manages dependencies so tests run after models are built.
Why designed this way?
dbt was designed to leverage the power of SQL and the database engine for testing, avoiding moving data around. This keeps tests fast and scalable. Using YAML for schema tests separates test logic from SQL code, making tests easier to write and maintain. The design balances flexibility with simplicity.
┌───────────────┐
│ Test YAML     │
│ definitions   │
└──────┬────────┘
       │ compiles
       ▼
┌───────────────┐
│ SQL test      │
│ queries       │
└──────┬────────┘
       │ runs on
       ▼
┌───────────────┐
│ Database      │
│ engine        │
└──────┬────────┘
       │ returns rows if fail
       ▼
┌───────────────┐
│ dbt collects  │
│ results       │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does dbt test fix data errors automatically? Commit yes or no.
Common Belief:dbt test automatically corrects data errors it finds.
Tap to reveal reality
Reality:dbt test only detects and reports errors; it does not fix data.
Why it matters:Expecting automatic fixes can lead to ignoring test failures and bad data persisting.
Quick: Do all tests run instantly regardless of data size? Commit yes or no.
Common Belief:All dbt tests run quickly no matter how big the data is.
Tap to reveal reality
Reality:Some tests, especially custom ones on large tables, can be slow and impact pipeline speed.
Why it matters:Ignoring test performance can cause slow deployments and frustrated teams.
Quick: Does a passing test guarantee perfect data? Commit yes or no.
Common Belief:If all dbt tests pass, the data is perfect and error-free.
Tap to reveal reality
Reality:Tests only check what you define; untested issues can still exist.
Why it matters:Relying solely on tests without good coverage can give false confidence.
Quick: Can you write tests that check multiple tables at once? Commit yes or no.
Common Belief:dbt tests can only check one table or model at a time.
Tap to reveal reality
Reality:Custom SQL tests can join multiple tables to check complex conditions.
Why it matters:Knowing this expands your ability to enforce cross-table data rules.
Expert Zone
1
Some tests may pass locally but fail in production due to data volume or environment differences.
2
Test failures can be caused by upstream model changes, so understanding dependencies is key to debugging.
3
Using tags and selective test runs helps manage large test suites efficiently.
When NOT to use
dbt test is not suitable for real-time streaming data validation or very high-frequency checks. For those, specialized data monitoring tools or database triggers are better. Also, for extremely complex validations, external data quality platforms may be preferred.
Production Patterns
In production, teams integrate dbt test into CI/CD pipelines with alerting on failures. They use test coverage reports to track quality over time. Some use test result artifacts to create dashboards showing data health trends.
Connections
Unit Testing in Software Development
dbt tests are like unit tests but for data models instead of code functions.
Understanding software unit testing helps grasp why automated data tests improve reliability and speed up development.
Data Quality Management
dbt test is a practical tool within the broader discipline of managing data quality.
Knowing data quality principles helps design better tests that cover critical data issues.
Continuous Integration / Continuous Deployment (CI/CD)
dbt test integrates into CI/CD pipelines to automate data validation on every change.
Understanding CI/CD concepts clarifies how automated tests fit into modern data engineering workflows.
Common Pitfalls
#1Ignoring test failures and proceeding with data analysis.
Wrong approach:dbt test # test fails but user ignores and continues using data
Correct approach:dbt test # test fails # user investigates and fixes data or models before proceeding
Root cause:Misunderstanding that tests are warnings, not blockers, leads to ignoring critical data issues.
#2Writing overly broad custom tests that return too many false positives.
Wrong approach:SELECT * FROM {{ ref('orders') }} WHERE order_date < '1900-01-01' OR order_date IS NULL OR order_date > CURRENT_DATE + 1000
Correct approach:SELECT * FROM {{ ref('orders') }} WHERE order_date < '1900-01-01' OR order_date IS NULL
Root cause:Trying to cover too many edge cases in one test causes noise and confusion.
#3Defining tests in the wrong YAML file or forgetting to link them to models.
Wrong approach:tests: - unique: column_name: id # but placed outside model schema YAML
Correct approach:models: - name: customers columns: - name: id tests: - unique
Root cause:Not following dbt's YAML structure causes tests to be ignored.
Key Takeaways
Running tests with dbt test is essential to catch data errors early and maintain trust in your data.
dbt supports built-in and custom tests, letting you check simple rules and complex business logic.
Tests are defined close to models in YAML or as SQL files, keeping your project organized.
Automating tests in CI/CD pipelines ensures continuous data quality without manual effort.
Understanding test results and limitations helps you fix issues efficiently and avoid false confidence.

Practice

(1/5)
1. What is the main purpose of running dbt test in a dbt project?
easy
A. To deploy the dbt project to production
B. To build new data models from raw data
C. To check data quality and find errors in your data models
D. To generate documentation for your data models

Solution

  1. Step 1: Understand the role of dbt test

    dbt test runs tests defined in your project to check data quality and catch errors.
  2. Step 2: Differentiate from other dbt commands

    Building models is done with dbt run, deployment is outside dbt test, and documentation is generated with dbt docs.
  3. Final Answer:

    To check data quality and find errors in your data models -> Option C
  4. Quick Check:

    dbt test = data quality checks [OK]
Hint: Remember: test = check data quality, run = build models [OK]
Common Mistakes:
  • Confusing dbt test with dbt run
  • Thinking dbt test deploys models
  • Assuming dbt test generates docs
2. Which of the following is the correct command to run tests only on a specific model named customers?
easy
A. dbt test --select customers
B. dbt run --models customers
C. dbt test --models customers
D. dbt test --only customers

Solution

  1. Step 1: Identify the correct flag for running tests on specific models

    The flag --select is used with dbt test to specify which models to test.
  2. Step 2: Check other options for correctness

    dbt run builds models, not tests. --models is not a valid flag for dbt test. --only is not a valid flag.
  3. Final Answer:

    dbt test --select customers -> Option A
  4. Quick Check:

    Use --select to target tests [OK]
Hint: Use --select flag to run tests on specific models [OK]
Common Mistakes:
  • Using dbt run instead of dbt test
  • Using invalid flags like --models or --only
  • Confusing --select with other flags
3. Given this schema.yml test definition:
models:
  - name: orders
    tests:
      - unique:
          column_name: order_id
      - not_null:
          column_name: order_date

What will dbt test check for the orders model?
medium
A. It checks that order_id is unique and order_date has no null values
B. It checks that order_id has no null values and order_date is unique
C. It checks that both order_id and order_date are unique
D. It checks that both order_id and order_date have no null values

Solution

  1. Step 1: Read the test types for each column

    The test unique applies to order_id, ensuring no duplicates. The test not_null applies to order_date, ensuring no missing values.
  2. Step 2: Match tests to their meaning

    unique means no duplicates; not_null means no nulls. So the checks are: order_id is unique and order_date has no null values.
  3. Final Answer:

    It checks that order_id is unique and order_date has no null values -> Option A
  4. Quick Check:

    unique = no duplicates, not_null = no nulls [OK]
Hint: unique = no duplicates, not_null = no nulls [OK]
Common Mistakes:
  • Mixing up unique and not_null tests
  • Assuming both columns have the same test
  • Ignoring the column_name key in test definitions
4. You run dbt test but get an error: Compilation Error: Could not find test 'uniquee'. What is the likely cause?
medium
A. The test passed with no errors
B. The model name is incorrect
C. The database connection is missing
D. A typo in the test name in schema.yml

Solution

  1. Step 1: Analyze the error message

    The error says it cannot find test 'uniquee', which looks like a misspelled test name.
  2. Step 2: Identify common causes of compilation errors

    Typos in test names in schema.yml cause dbt to fail to find the test. Model name or connection errors produce different messages.
  3. Final Answer:

    A typo in the test name in schema.yml -> Option D
  4. Quick Check:

    Compilation errors often mean typos [OK]
Hint: Check spelling of test names in schema.yml [OK]
Common Mistakes:
  • Ignoring typo errors and rerunning blindly
  • Assuming connection issues cause compilation errors
  • Confusing model name errors with test name errors
5. You want to ensure that the email column in your users model is unique and not null. You also want to run tests only on this model. Which schema.yml snippet and command combination is correct?
hard
A.
models:
  - name: users
    tests:
      - unique
      - not_null

Command: dbt test --models users
B.
models:
  - name: users
    columns:
      - name: email
        tests:
          - unique
          - not_null

Command: dbt test --select users
C.
models:
  - name: users
    columns:
      - name: email
        tests:
          - unique
          - not_null

Command: dbt run --models users
D.
models:
  - name: users
    tests:
      - unique
      - not_null

Command: dbt test --select users

Solution

  1. Step 1: Identify correct test syntax in schema.yml

    Tests on columns are defined under columns with name and tests list.
    models:
      - name: users
        columns:
          - name: email
            tests:
              - unique
              - not_null

    Command: dbt test --select users shows the correct format.
  2. Step 2: Choose the correct command to run tests on the users model

    dbt test --select users runs tests only on the users model.
    models:
      - name: users
        columns:
          - name: email
            tests:
              - unique
              - not_null

    Command: dbt test --select users uses this command correctly.
  3. Final Answer:

    Option B with column-level tests and dbt test --select users -> Option B
  4. Quick Check:

    Column tests + --select flag = correct [OK]
Hint: Define tests under columns, run with --select flag [OK]
Common Mistakes:
  • Defining tests directly under model without column_name (invalid syntax)
  • Using dbt run instead of dbt test
  • Using incorrect flags like --models