Bird
Raised Fist0
dbtdata~10 mins

Why testing ensures data quality in dbt - Visual Breakdown

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Why testing ensures data quality
Define data tests
Run tests on data
Check test results
Data is
trusted
Improve data quality
Re-run tests
This flow shows how defining and running tests on data helps catch errors early, leading to trusted data or fixing issues to improve quality.
Execution Sample
dbt
select * from users where email is null;
-- test to find missing emails

select count(*) from orders where order_date > current_date;
-- test to find future order dates
These tests check for missing emails and future order dates, which indicate data quality problems.
Execution Table
StepTest QueryTest PurposeResultAction
1select * from users where email is null;Check for missing emails2 rows foundFail - investigate missing emails
2select count(*) from orders where order_date > current_date;Check for future order dates0 rows foundPass - no future orders
3Re-run tests after fixVerify fixes0 rows foundPass - data quality improved
💡 Tests stop when all checks pass, ensuring data quality is verified.
Variable Tracker
VariableStartAfter Test 1After FixFinal
missing_emails_countunknown200
future_orders_countunknown000
Key Moments - 2 Insights
Why do we run tests before trusting data?
Because tests reveal problems like missing or incorrect data early, as shown in execution_table step 1 where missing emails were found.
What happens if a test fails?
We investigate and fix the data issues, then re-run tests to confirm the fix, as shown in execution_table steps 1 and 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what was the result of the test checking for future order dates?
A2 rows found
BTest not run
C0 rows found
DError in query
💡 Hint
Check execution_table row 2 under the Result column.
At which step did the missing emails count become zero?
AAfter Test 1
BAfter Fix
CStart
DFinal
💡 Hint
Look at variable_tracker row for missing_emails_count.
If the test for missing emails still found 1 row after fix, what would happen next?
AInvestigate and fix again
BStop testing
CData is trusted
DIgnore the test
💡 Hint
Refer to concept_flow where failing tests lead to investigation and fixing.
Concept Snapshot
Testing in dbt means writing queries that check data for errors.
Run tests regularly to catch problems early.
If tests fail, fix data and re-run tests.
Passing tests mean data is trusted and high quality.
Testing ensures reliable data for decisions.
Full Transcript
Testing ensures data quality by running queries that check for data problems like missing or incorrect values. When tests find issues, we fix them and run tests again to confirm the fix. This cycle helps keep data trustworthy and reliable for analysis and decisions.

Practice

(1/5)
1. Why is testing important in dbt for data quality?
easy
A. It automatically checks if data meets expected rules.
B. It speeds up data loading into the warehouse.
C. It creates visual reports for data trends.
D. It deletes old data to save space.

Solution

  1. Step 1: Understand the purpose of testing in dbt

    Testing in dbt is designed to check if data follows certain rules or expectations automatically.
  2. Step 2: Compare options with testing goals

    Only It automatically checks if data meets expected rules. describes automatic checking of data correctness, which matches testing's role.
  3. Final Answer:

    It automatically checks if data meets expected rules. -> Option A
  4. Quick Check:

    Testing = automatic data checks [OK]
Hint: Testing means automatic checks for data correctness [OK]
Common Mistakes:
  • Confusing testing with data loading speed
  • Thinking testing creates visual reports
  • Assuming testing deletes data
2. Which of the following is the correct syntax to add a test in a dbt model's YAML file?
easy
A. tests: - unique: column_name
B. test: unique column_name
C. tests: unique(column_name)
D. test: - unique: column_name

Solution

  1. Step 1: Recall dbt YAML test syntax

    In dbt, tests are added under the 'tests' key as a list with test name and column.
  2. Step 2: Match syntax with options

    tests: - unique: column_name correctly shows 'tests:' followed by '- unique: column_name' which is valid YAML for dbt tests.
  3. Final Answer:

    tests: - unique: column_name -> Option A
  4. Quick Check:

    YAML tests list = tests: - unique: column_name [OK]
Hint: Tests in YAML use 'tests:' with dash list [OK]
Common Mistakes:
  • Using 'test' instead of 'tests'
  • Missing dash '-' before test name
  • Incorrect parentheses usage
3. Given this dbt test result output:
{"failures": 3, "total_tests": 5}

What does this mean about the data quality?
medium
A. No tests were run on the data.
B. All tests passed, data is perfect.
C. 5 tests failed, data is unusable.
D. 3 tests failed, indicating some data issues.

Solution

  1. Step 1: Interpret test result fields

    'failures' shows how many tests failed; 'total_tests' is total run.
  2. Step 2: Analyze given numbers

    3 failures out of 5 means some tests failed, so data has issues but not all tests failed.
  3. Final Answer:

    3 tests failed, indicating some data issues. -> Option D
  4. Quick Check:

    failures = 3 means some errors [OK]
Hint: Failures number shows how many tests found problems [OK]
Common Mistakes:
  • Assuming failures means all tests failed
  • Thinking zero failures means errors
  • Ignoring total_tests count
4. You wrote this test in your dbt model YAML:
tests:
  - not_null: id
  - unique: id

But dbt throws an error when running tests. What is the likely problem?
medium
A. The tests list is missing a dash before 'not_null'.
B. The tests should be under 'columns', not directly under 'tests'.
C. The test names 'not_null' and 'unique' are invalid.
D. The YAML file must be named 'schema.yml' to run tests.

Solution

  1. Step 1: Recall correct YAML structure for dbt tests

    Tests on columns must be nested under 'columns:' key, not directly under 'tests:'.
  2. Step 2: Identify error cause

    Placing tests directly under 'tests:' causes syntax error; they belong under 'columns:' with column name and tests list.
  3. Final Answer:

    The tests should be under 'columns', not directly under 'tests'. -> Option B
  4. Quick Check:

    Tests belong under columns key [OK]
Hint: Tests on columns go under 'columns:' in YAML [OK]
Common Mistakes:
  • Putting tests directly under 'tests:' without 'columns:'
  • Using wrong test names
  • Wrong YAML file naming
5. You want to ensure no duplicate emails exist in your users table using dbt tests. Which YAML snippet correctly applies this test?
hard
A. columns: - email: tests: - unique
B. tests: - unique: email
C. columns: - name: email tests: - unique
D. columns: - name: email test: unique

Solution

  1. Step 1: Recall correct YAML format for column tests

    Tests are listed under 'columns:', each with 'name' and 'tests' list.
  2. Step 2: Match options with correct syntax

    columns: - name: email tests: - unique correctly uses 'columns:', '- name: email', and 'tests:' with '- unique'.
  3. Final Answer:

    columns: - name: email tests: - unique -> Option C
  4. Quick Check:

    Correct YAML structure = columns: - name: email tests: - unique [OK]
Hint: Use 'columns:' with 'name' and 'tests:' list [OK]
Common Mistakes:
  • Using 'test' instead of 'tests'
  • Missing 'name:' key for column
  • Placing tests outside 'columns:'