0
0
dbtdata~15 mins

Custom singular tests in dbt - Deep Dive

Choose your learning style9 modes available
Overview - Custom singular tests
What is it?
Custom singular tests in dbt are user-defined checks that validate specific conditions on your data. Unlike generic tests that check for common issues like uniqueness or nulls, singular tests let you write custom SQL queries to verify any rule you want. They return a pass or fail based on whether the query returns any rows. This helps ensure your data meets precise business or quality requirements.
Why it matters
Without custom singular tests, you might miss important data problems unique to your project. They let you catch errors early by enforcing rules that generic tests can't cover. This improves trust in your data and prevents bad decisions based on faulty information. Imagine relying on data reports that silently contain errors because no test checked those specific cases.
Where it fits
Before learning custom singular tests, you should understand basic dbt tests like uniqueness and not_null. After mastering singular tests, you can explore advanced testing strategies like test macros and automated test suites. This topic fits in the data quality and validation part of the dbt learning path.
Mental Model
Core Idea
A custom singular test is a special SQL query that returns rows only when data breaks a specific rule, signaling a test failure.
Think of it like...
It's like a smoke detector set to sense a very specific kind of smoke that only happens when a particular appliance malfunctions, alerting you only to that exact problem.
┌─────────────────────────────┐
│      Custom Singular Test    │
├─────────────────────────────┤
│ 1. Write SQL query checking  │
│    a specific data rule      │
│                             │
│ 2. Run query on dataset      │
│                             │
│ 3. If query returns rows →   │
│    test FAILS (problem found)│
│    else → test PASSES        │
└─────────────────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding basic dbt tests
🤔
Concept: Learn what dbt tests are and how they check data quality with simple built-in tests.
dbt provides built-in tests like unique and not_null that check common data issues. For example, a unique test ensures no duplicate values in a column. These tests run SQL queries behind the scenes and fail if they find any problem rows.
Result
You can run dbt tests and see which columns or tables have data issues.
Knowing how basic tests work helps you understand why custom tests are needed for rules that built-in tests can't cover.
2
FoundationWhat makes a test singular in dbt
🤔
Concept: Singular tests are custom SQL queries that return rows only when a rule is broken.
A singular test is a SQL query you write that returns zero rows if data is good, or one or more rows if data breaks the rule. dbt treats any returned rows as a test failure.
Result
You understand that singular tests are about writing queries that detect specific data problems.
This concept flips the usual idea: the test passes when no rows are returned, which is key to writing effective singular tests.
3
IntermediateWriting your first custom singular test
🤔Before reading on: do you think the test should return rows when data is correct or when data is wrong? Commit to your answer.
Concept: Learn how to write a SQL query for a singular test that detects a specific data issue.
Create a .sql file in your tests folder with a query that selects rows violating your rule. For example, to check if any orders have negative amounts, write: SELECT * FROM {{ ref('orders') }} WHERE amount < 0 If this query returns rows, the test fails.
Result
You can run dbt test and see if your custom test passes or fails based on your data.
Understanding that the test fails only when the query returns rows helps you design precise checks.
4
IntermediateIntegrating singular tests in dbt project
🤔Before reading on: do you think singular tests are added like built-in tests in schema.yml or separately? Commit to your answer.
Concept: Learn how to register and run singular tests within your dbt project structure.
Singular tests are added in the tests directory and referenced in your dbt_project.yml under 'tests'. You run them with 'dbt test'. Unlike generic tests, they are not declared in schema.yml but run as standalone SQL files.
Result
Your custom singular tests become part of your regular dbt test runs.
Knowing the difference in how singular tests are integrated prevents confusion and ensures proper test execution.
5
IntermediateUsing variables and macros in singular tests
🤔Before reading on: can singular tests use dbt macros and variables like models? Commit to your answer.
Concept: Singular tests can use dbt's templating features to make tests reusable and dynamic.
You can use Jinja macros and variables inside singular test SQL files. For example, use {{ ref('model_name') }} to refer to models dynamically or pass variables to customize test behavior.
Result
Tests become flexible and easier to maintain across environments.
Leveraging macros in tests unlocks powerful reuse and consistency in data validation.
6
AdvancedHandling test failures and error messages
🤔Before reading on: do singular tests provide detailed error messages by default? Commit to your answer.
Concept: Learn how to customize the output of singular tests to make failures easier to understand.
By selecting specific columns or adding descriptive messages in your test query, you can make test failures more informative. For example: SELECT id, amount, 'Negative amount found' AS error_message FROM {{ ref('orders') }} WHERE amount < 0 This helps quickly identify and fix data issues.
Result
Test failure outputs are clearer and actionable.
Improving error messages reduces debugging time and improves data quality workflows.
7
ExpertAdvanced patterns and pitfalls in singular tests
🤔Before reading on: do you think singular tests can cause performance issues if not written carefully? Commit to your answer.
Concept: Explore best practices and common mistakes in writing singular tests for production use.
Singular tests run SQL queries on your data warehouse, so inefficient queries can slow down your pipeline. Avoid heavy joins or large scans. Also, be careful with tests that depend on volatile data or timing, as they may cause flaky failures. Use incremental logic or limit scope when possible.
Result
You write efficient, reliable singular tests that scale with your data.
Understanding performance and stability considerations prevents costly pipeline slowdowns and false alarms.
Under the Hood
When you run a custom singular test, dbt executes the SQL query you wrote against your data warehouse. The query returns zero or more rows. dbt interprets zero rows as a passing test and any rows as a failure. Internally, dbt collects the query results and reports them in the test output. This mechanism leverages the database's query engine to perform the validation.
Why designed this way?
This design lets users write any SQL logic needed to check complex or unique data rules without being limited to predefined test types. It uses the database's power directly, avoiding extra layers or custom code. The simplicity of 'rows means failure' makes test results easy to interpret and consistent.
┌───────────────┐      ┌───────────────┐      ┌───────────────────────────┐
│ Custom SQL    │─────▶│ Database      │─────▶│ Result Rows              │
│ Singular Test │      │ Query Engine  │      │ (0 = pass, >0 = fail)    │
└───────────────┘      └───────────────┘      └───────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does a singular test pass when its query returns rows? Commit yes or no.
Common Belief:If the test query returns rows, the test passes because it found data.
Tap to reveal reality
Reality:A singular test fails if the query returns any rows; zero rows means pass.
Why it matters:Misunderstanding this causes tests to be written backwards, letting bad data pass unnoticed.
Quick: Can singular tests be declared in schema.yml like generic tests? Commit yes or no.
Common Belief:Singular tests are added in schema.yml just like built-in tests.
Tap to reveal reality
Reality:Singular tests are standalone SQL files and are not declared in schema.yml.
Why it matters:Trying to declare singular tests in schema.yml leads to errors or tests not running.
Quick: Do singular tests always run fast regardless of query complexity? Commit yes or no.
Common Belief:Singular tests are lightweight and won't affect performance much.
Tap to reveal reality
Reality:Complex or unoptimized singular test queries can slow down your dbt runs significantly.
Why it matters:Ignoring performance can cause slow pipelines and delayed data delivery.
Quick: Can singular tests only check one row at a time? Commit yes or no.
Common Belief:Singular tests only check single rows or simple conditions.
Tap to reveal reality
Reality:Singular tests can check complex conditions involving multiple rows, joins, and aggregations.
Why it matters:Underestimating their power limits how effectively you use singular tests for data quality.
Expert Zone
1
Singular tests can leverage dbt's Jinja context to dynamically adapt to environments or model changes, making tests more maintainable.
2
Because singular tests return rows on failure, selecting meaningful columns and messages is crucial for quick debugging in large datasets.
3
Singular tests can be combined with hooks or run as part of CI/CD pipelines to enforce data quality gates before deployment.
When NOT to use
Avoid singular tests for very simple checks like uniqueness or nulls where built-in generic tests are more efficient and standardized. Also, do not use singular tests for extremely large datasets without optimization, as they can cause performance issues. Instead, use incremental tests or sampling strategies.
Production Patterns
In production, teams use singular tests to enforce business rules like 'no orders with negative amounts' or 'all active users must have verified emails.' They integrate these tests into automated pipelines, failing builds if tests fail, ensuring only clean data moves downstream.
Connections
Unit Testing in Software Development
Both involve writing specific checks that pass or fail based on conditions.
Understanding singular tests as data unit tests helps grasp their role in catching precise errors early, just like code unit tests catch bugs.
Database Constraints
Singular tests complement database constraints by checking rules that constraints cannot enforce.
Knowing database constraints clarifies why singular tests are needed for complex or business-specific validations beyond schema rules.
Quality Control in Manufacturing
Both use targeted inspections to detect defects that general checks might miss.
Seeing singular tests as quality control steps helps appreciate their role in maintaining high data standards.
Common Pitfalls
#1Writing a singular test query that returns rows when data is correct.
Wrong approach:SELECT * FROM {{ ref('orders') }} WHERE amount >= 0
Correct approach:SELECT * FROM {{ ref('orders') }} WHERE amount < 0
Root cause:Misunderstanding that singular tests fail when rows are returned, so the query must select only bad data.
#2Declaring singular tests inside schema.yml like generic tests.
Wrong approach:tests: - singular_test_name
Correct approach:Place singular test SQL files in the tests directory and run with 'dbt test'.
Root cause:Confusing singular tests with generic tests and their different integration methods.
#3Writing complex joins in singular tests without optimization causing slow runs.
Wrong approach:SELECT * FROM big_table a JOIN big_table b ON a.id = b.ref_id WHERE a.status = 'bad'
Correct approach:Limit data scanned or pre-aggregate in models before testing, or add filters to reduce rows.
Root cause:Not considering query performance impact of test SQL on large datasets.
Key Takeaways
Custom singular tests let you write any SQL query that returns rows only when data breaks a specific rule, signaling failure.
They are separate from generic tests and run as standalone SQL files, not declared in schema.yml.
Writing effective singular tests requires understanding that zero rows means pass and any rows mean fail.
Using macros and clear error messages in singular tests improves maintainability and debugging.
Performance matters: inefficient singular test queries can slow down your dbt runs and cause flaky failures.