Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Why testing ensures data quality
📖 Scenario: You work as a data analyst in a company that uses dbt to manage data transformations. Your team wants to make sure the data is accurate and reliable before using it for reports.
🎯 Goal: You will create a simple dbt model and add tests to check data quality. This will help catch errors early and keep data trustworthy.
📋 What You'll Learn
Create a dbt model with sample data
Add a config variable to set a threshold
Write a test to check for null values in a column
Write a test to check that values meet the threshold
Display the test results
💡 Why This Matters
🌍 Real World
Data teams use testing in dbt to ensure data pipelines produce accurate and clean data before reports or dashboards use it.
💼 Career
Knowing how to write and run tests in dbt is a key skill for data analysts and engineers to maintain high data quality.
Progress0 / 4 steps
1
Create a dbt model with sample data
Create a dbt model file called models/sample_data.sql with a table that has columns id and value. Insert these exact rows: (1, 10), (2, 20), (3, NULL), (4, 40).
dbt
Hint
Use a CTE with union all to create the rows.
2
Add a config variable for minimum value threshold
In the same model file models/sample_data.sql, add a config variable called min_value_threshold and set it to 15.
dbt
Hint
Use the {{ config() }} Jinja function to add variables.
3
Add tests to check data quality
Create two tests in tests folder: one test called test_no_null_values.sql that checks value column has no nulls, and another test called test_value_above_threshold.sql that checks all value entries are greater than or equal to min_value_threshold variable.
dbt
Hint
Write SQL queries that return rows violating the test condition.
4
Run tests and display results
Run the dbt tests and print the summary output showing which tests passed or failed.
dbt
Hint
Use the dbt test command in your terminal to run tests and see results.
Practice
(1/5)
1. Why is testing important in dbt for data quality?
easy
A. It automatically checks if data meets expected rules.
B. It speeds up data loading into the warehouse.
C. It creates visual reports for data trends.
D. It deletes old data to save space.
Solution
Step 1: Understand the purpose of testing in dbt
Testing in dbt is designed to check if data follows certain rules or expectations automatically.
Step 2: Compare options with testing goals
Only It automatically checks if data meets expected rules. describes automatic checking of data correctness, which matches testing's role.
Final Answer:
It automatically checks if data meets expected rules. -> Option A
Quick Check:
Testing = automatic data checks [OK]
Hint: Testing means automatic checks for data correctness [OK]
Common Mistakes:
Confusing testing with data loading speed
Thinking testing creates visual reports
Assuming testing deletes data
2. Which of the following is the correct syntax to add a test in a dbt model's YAML file?
easy
A. tests: - unique: column_name
B. test: unique column_name
C. tests: unique(column_name)
D. test: - unique: column_name
Solution
Step 1: Recall dbt YAML test syntax
In dbt, tests are added under the 'tests' key as a list with test name and column.
Step 2: Match syntax with options
tests: - unique: column_name correctly shows 'tests:' followed by '- unique: column_name' which is valid YAML for dbt tests.
Final Answer:
tests: - unique: column_name -> Option A
Quick Check:
YAML tests list = tests: - unique: column_name [OK]
Hint: Tests in YAML use 'tests:' with dash list [OK]
Common Mistakes:
Using 'test' instead of 'tests'
Missing dash '-' before test name
Incorrect parentheses usage
3. Given this dbt test result output:
{"failures": 3, "total_tests": 5}
What does this mean about the data quality?
medium
A. No tests were run on the data.
B. All tests passed, data is perfect.
C. 5 tests failed, data is unusable.
D. 3 tests failed, indicating some data issues.
Solution
Step 1: Interpret test result fields
'failures' shows how many tests failed; 'total_tests' is total run.
Step 2: Analyze given numbers
3 failures out of 5 means some tests failed, so data has issues but not all tests failed.
Final Answer:
3 tests failed, indicating some data issues. -> Option D
Quick Check:
failures = 3 means some errors [OK]
Hint: Failures number shows how many tests found problems [OK]
Common Mistakes:
Assuming failures means all tests failed
Thinking zero failures means errors
Ignoring total_tests count
4. You wrote this test in your dbt model YAML:
tests:
- not_null: id
- unique: id
But dbt throws an error when running tests. What is the likely problem?
medium
A. The tests list is missing a dash before 'not_null'.
B. The tests should be under 'columns', not directly under 'tests'.
C. The test names 'not_null' and 'unique' are invalid.
D. The YAML file must be named 'schema.yml' to run tests.
Solution
Step 1: Recall correct YAML structure for dbt tests
Tests on columns must be nested under 'columns:' key, not directly under 'tests:'.
Step 2: Identify error cause
Placing tests directly under 'tests:' causes syntax error; they belong under 'columns:' with column name and tests list.
Final Answer:
The tests should be under 'columns', not directly under 'tests'. -> Option B
Quick Check:
Tests belong under columns key [OK]
Hint: Tests on columns go under 'columns:' in YAML [OK]
Common Mistakes:
Putting tests directly under 'tests:' without 'columns:'
Using wrong test names
Wrong YAML file naming
5. You want to ensure no duplicate emails exist in your users table using dbt tests. Which YAML snippet correctly applies this test?
hard
A. columns:
- email:
tests:
- unique
B. tests:
- unique: email
C. columns:
- name: email
tests:
- unique
D. columns:
- name: email
test: unique
Solution
Step 1: Recall correct YAML format for column tests
Tests are listed under 'columns:', each with 'name' and 'tests' list.
Step 2: Match options with correct syntax
columns:
- name: email
tests:
- unique correctly uses 'columns:', '- name: email', and 'tests:' with '- unique'.
Final Answer:
columns:
- name: email
tests:
- unique -> Option C