Custom singular tests in dbt - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When we write custom singular tests in dbt, we want to know how the time to run these tests changes as our data grows.
We ask: How does the test execution time grow when the data size increases?
Analyze the time complexity of the following dbt custom singular test.
select
count(*) as error_count
from {{ model }}
where some_column is null
This test counts how many rows in a model have a null value in a specific column.
Look for repeated actions in the test query.
- Primary operation: Scanning all rows in the model to check the column value.
- How many times: Once for each row in the model.
The test checks every row once, so if the number of rows grows, the work grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The number of operations grows directly with the number of rows.
Time Complexity: O(n)
This means the test takes longer in direct proportion to the number of rows it checks.
[X] Wrong: "The test runs instantly no matter how big the data is."
[OK] Correct: Because the test looks at every row, more rows mean more work and more time.
Understanding how tests scale with data size helps you write efficient checks and shows you think about performance in real projects.
"What if the test checked two columns instead of one? How would the time complexity change?"
Practice
Solution
Step 1: Understand the role of custom singular tests
Custom singular tests are SQL queries that check data quality by returning rows only when problems exist.Step 2: Compare options with this definition
Only To write your own SQL query that checks data quality and returns rows only if there are issues describes writing a SQL query that returns rows if there are data issues, matching the purpose of custom singular tests.Final Answer:
To write your own SQL query that checks data quality and returns rows only if there are issues -> Option BQuick Check:
Custom singular test = SQL check returning problem rows [OK]
- Confusing tests with documentation generation
- Thinking tests create tables
- Assuming tests schedule runs
schema.yml file?Solution
Step 1: Recall the schema.yml syntax for custom singular tests
Custom singular tests are referenced by their filename (without .sql) in the tests list of schema.yml.Step 2: Match options to this syntax
tests: - my_custom_test correctly references the test file tests/my_custom_test.sql. Other options use incorrect structure, extra keys, or include .sql.Final Answer:
tests: - my_custom_test -> Option DQuick Check:
schema.yml test syntax = - test_filename_without_sql [OK]
- Using 'name' or 'test' keys
- Including .sql extension
- Using map/dict structure
tests/check_positive_values.sql:
SELECT * FROM {{ ref('orders') }} WHERE amount <= 0
What will be the output if all amounts in the orders table are positive?Solution
Step 1: Understand the test SQL logic
The test selects rows where amount is less than or equal to zero.Step 2: Analyze the data condition
If all amounts are positive, no rows satisfy the condition, so the query returns zero rows.Final Answer:
An empty result with zero rows -> Option AQuick Check:
All positive amounts means zero rows returned [OK]
- Expecting a count instead of rows
- Thinking it returns all rows
- Assuming SQL syntax error
Solution
Step 1: Identify causes of SQL syntax errors
Syntax errors happen when SQL is malformed, such as missing SELECT statements.Step 2: Evaluate options for syntax error causes
The SQL file is missing the requiredSELECTstatement directly relates to SQL syntax. Other options cause runtime or configuration errors, not syntax errors.Final Answer:
The SQL file is missing the requiredSELECTstatement -> Option CQuick Check:
Syntax error = malformed SQL like missing SELECT [OK]
- Confusing missing test listing with syntax error
- Assuming zero rows cause syntax errors
- Ignoring missing model references
users table. Which SQL query should you write in your test file?Solution
Step 1: Understand the test goal
The test should return rows where email is NULL to detect missing emails.Step 2: Choose the SQL that returns rows with NULL emails
SELECT * FROM {{ ref('users') }} WHERE email IS NULL returns rows only when there are NULL emails (0 rows = pass). COUNT(*) always returns one row, failing even with zero NULLs. IS NOT NULL selects good rows (opposite). = '' checks empty strings, not NULLs.Final Answer:
SELECT * FROM {{ ref('users') }} WHERE email IS NULL -> Option AQuick Check:
Return rows with NULL email = SELECT * FROM {{ ref('users') }} WHERE email IS NULL [OK]
- Using COUNT(*) instead of returning rows
- Checking for empty string instead of NULL
- Selecting non-NULL emails
