How to Create Custom Tests in dbt for Data Quality
To create a custom test in
dbt, write a SQL file in the tests directory that returns rows when data fails the test. Then, define the test in your model's YAML file using test_name and any arguments. Run dbt test to execute your custom test.Syntax
A custom test in dbt is a SQL query that returns zero rows when the data passes the test. The basic syntax involves creating a SQL file in the tests/ folder with a query that selects rows violating your condition.
In your model's YAML file, you reference the test by name and pass any required arguments.
- SQL test file: Contains the test logic.
- YAML config: Links the test to a model column.
- Arguments: Parameters to customize the test.
sql
-- tests/not_null_custom.sql SELECT * FROM {{ model }} WHERE {{ column_name }} IS NULL
Example
This example creates a custom test to check that a column has no NULL values. The SQL test file uses {{ model }} and {{ column_name }} as variables. The YAML file applies the test to a specific column.
sql + yaml
-- tests/not_null_custom.sql
SELECT * FROM {{ model }} WHERE {{ column_name }} IS NULL
# models/schema.yml
version: 2
models:
- name: customers
columns:
- name: email
tests:
- not_null_custom:
column_name: emailOutput
If any rows in the 'customers' model have NULL in 'email', dbt test will fail and show those rows.
Common Pitfalls
Common mistakes when creating custom tests include:
- Not returning zero rows on success. The test SQL must return rows only when data fails.
- Forgetting to use
{{ model }}and{{ column_name }}placeholders for flexibility. - Misconfiguring the YAML file by not passing required arguments.
Here is an example of a wrong and right test SQL:
sql
-- Wrong test (returns all rows, always fails) SELECT * FROM customers WHERE email IS NULL OR email IS NOT NULL -- Right test (returns only failing rows) SELECT * FROM {{ model }} WHERE {{ column_name }} IS NULL
Quick Reference
| Step | Description |
|---|---|
| Create SQL test file | Write a query returning failing rows using {{ model }} and {{ column_name }} |
| Add test to YAML | Reference the test under model columns with any needed arguments |
| Run tests | Execute dbt test to validate data |
| Fix data | If test fails, correct data issues and rerun |
Key Takeaways
Custom tests in dbt are SQL queries returning rows only when data fails the test.
Use {{ model }} and {{ column_name }} placeholders in your test SQL for reusability.
Define your custom test in the model's YAML file with proper arguments.
Run dbt test to execute your custom tests and catch data quality issues.
Ensure your test SQL returns zero rows when data passes to avoid false failures.