Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Using dbt Built-in Tests for Data Quality
📖 Scenario: You work as a data analyst for an online store. You want to make sure your sales data is clean and reliable before using it for reports. You will use dbt built-in tests to check your data quality.
🎯 Goal: Learn how to apply dbt built-in tests: unique, not_null, accepted_values, and relationships to a sales data model.
📋 What You'll Learn
Create a dbt model with sales data
Add a unique test on order_id
Add a not_null test on customer_id
Add an accepted_values test on order_status
Add a relationships test between customer_id in sales and customers table
💡 Why This Matters
🌍 Real World
Data analysts and engineers use dbt tests to catch data errors early and maintain trust in reports.
💼 Career
Knowing dbt built-in tests is essential for roles involving data modeling, data quality, and analytics engineering.
Progress0 / 4 steps
1
Create sales data model
Create a dbt model file called sales.sql with a SELECT statement that returns these columns: order_id, customer_id, order_status. Use the exact column names.
dbt
Hint
Use select order_id, customer_id, order_status from raw.sales_data.
2
Add unique and not_null tests
In your schema.yml file, add a unique test on order_id and a not_null test on customer_id for the sales model.
dbt
Hint
Use tests: - unique under order_id and tests: - not_null under customer_id.
3
Add accepted_values test
In the schema.yml file, add an accepted_values test on order_status with accepted values ['pending', 'shipped', 'delivered', 'cancelled'] for the sales model.
dbt
Hint
Use accepted_values: values: ['pending', 'shipped', 'delivered', 'cancelled'] under order_status.
4
Add relationships test
In the schema.yml file, add a relationships test on customer_id in the sales model. It should reference the id column in the customers model.
dbt
Hint
Use relationships: to: ref('customers') field: id under customer_id.
Practice
(1/5)
1. What does the built-in unique test in dbt check for in a column?
easy
A. It checks that the column has no missing (null) values.
B. It checks that the column values exist in another table's column.
C. It checks that the column values match a predefined list of accepted values.
D. It checks that all values in the column are different with no duplicates.
Solution
Step 1: Understand the purpose of the unique test
The unique test ensures that each value in the specified column appears only once, meaning no duplicates.
Step 2: Compare with other test types
Other tests like not_null check for missing values, accepted_values check for allowed values, and relationships check for foreign key matches.
Final Answer:
It checks that all values in the column are different with no duplicates. -> Option D
Quick Check:
unique test = no duplicates [OK]
Hint: Unique means no duplicates allowed in the column [OK]
Common Mistakes:
Confusing unique with not_null test
Thinking unique checks accepted values
Mixing unique with relationships test
2. Which of the following is the correct syntax to add a not_null test on the column user_id in a dbt model YAML file?
easy
A. columns:
- name: user_id
tests:
- not_null
B. columns:
- user_id:
tests:
- not_null
C. tests:
- not_null: user_id
D. columns:
- name: user_id
test: not_null
Solution
Step 1: Recall YAML structure for dbt tests
Tests are added under the columns list, each column has a name and a tests list with test names.
Step 2: Identify correct indentation and keys
columns:
- name: user_id
tests:
- not_null correctly uses 'name' for the column and 'tests' as a list with '- not_null'. Other options have wrong keys or structure.
Final Answer:
columns:
- name: user_id
tests:
- not_null -> Option A
Quick Check:
YAML tests under columns with name and tests list [OK]
Hint: Use 'name' and 'tests' keys with proper indentation [OK]
What happens if the status column contains the value 'deleted' when you run dbt test?
medium
A. The test passes because 'deleted' is a valid string.
B. The test fails because 'deleted' is not in the accepted values list.
C. The test is skipped because accepted_values only checks for nulls.
D. The test throws a syntax error due to incorrect YAML.
Solution
Step 1: Understand accepted_values test behavior
The accepted_values test checks if all column values are within the specified list.
Step 2: Check if 'deleted' is in the list
'deleted' is not in ['active', 'inactive', 'pending'], so the test will fail.
Final Answer:
The test fails because 'deleted' is not in the accepted values list. -> Option B
Quick Check:
accepted_values rejects values outside list [OK]
Hint: Accepted_values fails if any value is outside the list [OK]
Common Mistakes:
Assuming test passes if value is a string
Confusing accepted_values with not_null
Thinking test skips unknown values
4. You wrote this test in your dbt model YAML:
columns:
- name: order_id
tests:
- relationships:
to: ref('orders')
But running dbt test gives an error. What is the most likely cause?
medium
A. The 'field' key is missing in the relationships test.
B. The 'to' value should be a string, not a ref function.
C. The relationships test requires the 'field' to be the same as the column name.
D. The 'to' value must be a table name string, not a ref function.
Solution
Step 1: Understand relationships test syntax
The relationships test requires both 'to' (target table) and 'field' (target column).
Step 2: Identify the error cause
The YAML is missing the 'field' key, causing a configuration error when running dbt test.
Final Answer:
The 'field' key is missing in the relationships test. -> Option A
Quick Check:
relationships 'to' + 'field' required [OK]
Hint: relationships test requires 'to' and 'field' keys [OK]
Common Mistakes:
Using ref() in YAML instead of table name string
Omitting the 'field' key
Assuming 'field' must match column name
5. You want to ensure that the customer_id column in your orders model is unique, not null, and only contains values that exist in the customers table's id column. Which combination of built-in tests should you add in your YAML?
hard
A. - not_null
- accepted_values:
values: [unique]
- relationships:
to: customers
field: id
B. - unique
- accepted_values:
values: [not null]
- relationships:
to: customers
field: id
C. - unique
- not_null
- relationships:
to: customers
field: id
D. - unique
- not_null
- accepted_values:
values: [customer_id]
Solution
Step 1: Identify tests for uniqueness and non-null
Use 'unique' to ensure no duplicates and 'not_null' to prevent missing values.
Step 2: Ensure foreign key relationship
Use 'relationships' test with 'to' as 'customers' table and 'field' as 'id' to check existence.
Step 3: Verify other options
Options B, C, and D misuse accepted_values or mix concepts incorrectly.
Final Answer:
- unique
- not_null
- relationships:
to: customers
field: id -> Option C