0
0
dbtdata~10 mins

Why testing ensures data quality in dbt - Visual Breakdown

Choose your learning style9 modes available
Concept Flow - Why testing ensures data quality
Define data tests
Run tests on data
Check test results
Data is
trusted
Improve data quality
Re-run tests
This flow shows how defining and running tests on data helps catch errors early, leading to trusted data or fixing issues to improve quality.
Execution Sample
dbt
select * from users where email is null;
-- test to find missing emails

select count(*) from orders where order_date > current_date;
-- test to find future order dates
These tests check for missing emails and future order dates, which indicate data quality problems.
Execution Table
StepTest QueryTest PurposeResultAction
1select * from users where email is null;Check for missing emails2 rows foundFail - investigate missing emails
2select count(*) from orders where order_date > current_date;Check for future order dates0 rows foundPass - no future orders
3Re-run tests after fixVerify fixes0 rows foundPass - data quality improved
💡 Tests stop when all checks pass, ensuring data quality is verified.
Variable Tracker
VariableStartAfter Test 1After FixFinal
missing_emails_countunknown200
future_orders_countunknown000
Key Moments - 2 Insights
Why do we run tests before trusting data?
Because tests reveal problems like missing or incorrect data early, as shown in execution_table step 1 where missing emails were found.
What happens if a test fails?
We investigate and fix the data issues, then re-run tests to confirm the fix, as shown in execution_table steps 1 and 3.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what was the result of the test checking for future order dates?
A2 rows found
BTest not run
C0 rows found
DError in query
💡 Hint
Check execution_table row 2 under the Result column.
At which step did the missing emails count become zero?
AAfter Test 1
BAfter Fix
CStart
DFinal
💡 Hint
Look at variable_tracker row for missing_emails_count.
If the test for missing emails still found 1 row after fix, what would happen next?
AInvestigate and fix again
BStop testing
CData is trusted
DIgnore the test
💡 Hint
Refer to concept_flow where failing tests lead to investigation and fixing.
Concept Snapshot
Testing in dbt means writing queries that check data for errors.
Run tests regularly to catch problems early.
If tests fail, fix data and re-run tests.
Passing tests mean data is trusted and high quality.
Testing ensures reliable data for decisions.
Full Transcript
Testing ensures data quality by running queries that check for data problems like missing or incorrect values. When tests find issues, we fix them and run tests again to confirm the fix. This cycle helps keep data trustworthy and reliable for analysis and decisions.