Testing model outputs in dbt - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When testing model outputs in dbt, we want to know how the time to run tests changes as data grows.
We ask: How does the test execution time grow with more data?
Analyze the time complexity of the following dbt test code.
-- test to check for nulls in a column
select
count(*) as null_count
from {{ ref('my_model') }}
where important_column is null
This test counts how many rows have nulls in a specific column of a model.
Look for repeated work in the test query.
- Primary operation: Scanning all rows in the model table.
- How many times: Once per test run, but it checks every row.
The test reads every row to count nulls, so more rows mean more work.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 row checks |
| 100 | 100 row checks |
| 1000 | 1000 row checks |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the test time grows linearly as the data size grows.
[X] Wrong: "The test only checks a few rows, so it runs in constant time."
[OK] Correct: The test scans every row to count nulls, so it must look at all data, making time grow with data size.
Understanding how test queries scale helps you write efficient checks and explain their impact clearly in real projects.
"What if the test checked for nulls only in a small indexed subset of rows? How would the time complexity change?"
Practice
Solution
Step 1: Understand the goal of testing in dbt
Testing checks if the data produced by models is correct and trustworthy.Step 2: Identify the main benefit of testing outputs
Accurate and reliable data helps users make good decisions and trust reports.Final Answer:
To ensure the data is accurate and reliable -> Option AQuick Check:
Testing = Accurate data [OK]
- Thinking tests speed up loading
- Confusing testing with table creation
- Assuming tests delete data
user_id in a dbt model's schema.yml file?Solution
Step 1: Check the correct key for model name
The key to specify the model isname, notmodel.Step 2: Verify column and test syntax
Each column usesnameand tests are listed undertestsas a list.Final Answer:
- name: users columns: - name: user_id tests: - unique -> Option DQuick Check:
Correct schema.yml syntax = - name: users columns: - name: user_id tests: - unique [OK]
- Using 'model' instead of 'name' for model
- Writing 'test' instead of 'tests'
- Omitting 'name' for column
schema.yml for the model orders:
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
What will happen if the orders table has two rows with the same order_id and one row with order_id as NULL when you run dbt test?Solution
Step 1: Understand the tests applied
The tests areuniqueandnot_nullonorder_id.Step 2: Analyze the data issues
Two rows have the sameorder_id(violates uniqueness) and one row has NULLorder_id(violates not_null).Final Answer:
The test will fail because of duplicate and NULL values in order_id -> Option AQuick Check:
Duplicates + NULLs = test fail [OK]
- Assuming NULLs are allowed in unique tests
- Thinking only one test failure causes pass
- Ignoring NULL violation in not_null test
schema.yml file:
- name: customers
columns:
- name: email
tests:
- unique
- not_null
But when you run dbt test, you get an error saying Invalid test configuration. What is the likely cause?Solution
Step 1: Check the structure of schema.yml
Tests must be defined under themodels:key inschema.yml.Step 2: Identify missing
The snippet misses themodels:keymodels:root key, causing invalid configuration.Final Answer:
The model name should be undermodelskey inschema.yml-> Option CQuick Check:
Missing 'models:' key = config error [OK]
- Removing dashes from test names
- Incorrect indentation of tests list
- Not placing model under 'models:'
status column in your transactions model only contains the values 'pending', 'completed', or 'failed'. Which test definition in schema.yml correctly enforces this?Solution
Step 1: Identify the correct test for allowed values
Theaccepted_valuestest checks if column values are in a list.Step 2: Check correct syntax for accepted_values
The test requires a dictionary with keyvalueslisting allowed values.Final Answer:
- name: transactions columns: - name: status tests: - accepted_values: values: ['pending', 'completed', 'failed'] -> Option BQuick Check:
accepted_values with 'values' key = - name: transactions columns: - name: status tests: - accepted_values: values: ['pending', 'completed', 'failed'] [OK]
- Using unique or not_null instead of accepted_values
- Omitting 'values:' key under accepted_values
- Using wrong test name like values_in
