0
0
dbtdata~15 mins

Store test failures for analysis in dbt - Deep Dive

Choose your learning style9 modes available
Overview - Store test failures for analysis
What is it?
Storing test failures for analysis means saving information about when and why data tests fail in dbt projects. This helps teams understand data quality issues by keeping a record of errors instead of just seeing them once. It involves capturing details like which test failed, when, and what data caused the failure. This makes it easier to track problems over time and fix them effectively.
Why it matters
Without storing test failures, teams only see errors temporarily and lose valuable context about recurring data problems. This slows down debugging and can lead to bad decisions based on faulty data. By saving failure details, teams can analyze patterns, prioritize fixes, and improve data trustworthiness. It turns testing from a one-time check into a continuous learning process that supports better data-driven decisions.
Where it fits
Before this, learners should understand basic dbt testing concepts and how to write tests. After this, they can explore advanced monitoring, alerting, and automated data quality dashboards that use stored failure data to provide insights.
Mental Model
Core Idea
Storing test failures captures detailed error information over time to enable deeper analysis and continuous data quality improvement.
Think of it like...
It's like keeping a diary of all your car's warning lights and issues instead of just fixing them once and forgetting. Over time, the diary helps you spot patterns and prevent bigger breakdowns.
┌───────────────────────────────┐
│       dbt Test Runs           │
├───────────────┬───────────────┤
│ Test Result   │ Failure Info  │
├───────────────┼───────────────┤
│ Pass          │ -             │
│ Fail          │ Store details │
│               │ (time, data)  │
└───────────────┴───────────────┘
          ↓
┌───────────────────────────────┐
│   Failure Storage Table        │
│ - Test name                   │
│ - Timestamp                   │
│ - Failed rows or error info   │
└───────────────────────────────┘
          ↓
┌───────────────────────────────┐
│   Analysis & Reporting         │
│ - Identify recurring issues   │
│ - Prioritize fixes             │
│ - Improve data quality         │
└───────────────────────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding dbt Tests Basics
🤔
Concept: Learn what dbt tests are and how they check data quality.
dbt tests are simple checks you write to make sure your data meets expectations. For example, you can test if a column has no nulls or if values are unique. When you run dbt, it tells you which tests pass or fail.
Result
You know how to write and run basic dbt tests that check your data.
Understanding tests is the first step to improving data quality because you need to know what checks exist before analyzing failures.
2
FoundationWhat Happens When Tests Fail
🤔
Concept: Learn how dbt reports test failures and what information is available.
When a test fails, dbt shows an error message and sometimes the rows causing the failure. However, this info is only visible during the run and not saved anywhere by default.
Result
You see that test failures are temporary and not stored automatically.
Knowing that failure info disappears after runs highlights the need to store it for later analysis.
3
IntermediateCreating a Table to Store Failures
🤔Before reading on: do you think storing failures requires changing dbt core code or can it be done with models? Commit to your answer.
Concept: Introduce the idea of creating a dedicated table in your warehouse to save failure details.
You can create a dbt model that captures failed test results and inserts them into a table. This table can store test names, timestamps, and failed rows or error details. This way, failure info is saved after each run.
Result
A persistent table exists that accumulates test failure data over time.
Understanding that you can use dbt models to store failures avoids complex tooling and leverages your existing warehouse.
4
IntermediateCapturing Failure Details with Custom SQL
🤔Before reading on: do you think dbt test failures can be captured automatically or do you need custom SQL? Commit to your answer.
Concept: Learn how to write SQL queries that extract failure details from test results.
You write SQL that selects failed rows from test output tables or logs and inserts them into your failure storage table. This requires understanding how dbt test results are structured and accessed.
Result
Failure details like which rows failed and when are stored in a structured way.
Knowing how to extract failure data with SQL empowers you to customize what info you keep for analysis.
5
AdvancedAutomating Failure Storage in dbt Runs
🤔Before reading on: do you think storing failures can be fully automated within dbt runs or requires manual steps? Commit to your answer.
Concept: Explore ways to automate failure data capture as part of your dbt workflow.
You can add post-hook SQL or separate models that run after tests to insert failure info into your storage table automatically. This ensures every run updates the failure log without manual intervention.
Result
Failure storage happens automatically every time you run dbt tests.
Automating failure capture saves time and ensures consistent data quality tracking.
6
ExpertAnalyzing Failure Trends for Data Quality
🤔Before reading on: do you think stored failure data is only useful for fixing immediate errors or can it reveal deeper patterns? Commit to your answer.
Concept: Use stored failure data to find recurring issues and improve data quality over time.
By querying your failure storage table, you can identify tests that fail often, times when failures spike, or data sources causing problems. This analysis helps prioritize fixes and monitor improvements.
Result
You gain insights into data quality trends and can make informed decisions to improve your data.
Understanding that failure storage enables strategic data quality management transforms testing from reactive to proactive.
Under the Hood
When dbt runs tests, it executes SQL queries that check data conditions. If a test fails, dbt captures the failing rows or error messages temporarily in memory or logs. By creating models or hooks that run after tests, you can query these failure outputs and insert them into a dedicated table in your data warehouse. This table accumulates failure records over time, indexed by test name and timestamp, allowing historical analysis.
Why designed this way?
dbt was designed to keep tests simple and focused on immediate feedback, not long-term failure storage. This keeps core dbt lightweight and flexible. Storing failures separately lets teams customize what and how much info to keep based on their needs. Alternatives like built-in failure logs would add complexity and storage overhead for all users, so the current design balances simplicity with extensibility.
┌───────────────┐       ┌───────────────┐       ┌─────────────────────┐
│ dbt Test Run  │──────▶│ Test Execution│──────▶│ Failure Output (temp)│
└───────────────┘       └───────────────┘       └─────────┬───────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────────┐
                                              │ Failure Storage Table│
                                              │ (persistent in DW)  │
                                              └─────────┬───────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────────┐
                                              │ Analysis & Reporting │
                                              └─────────────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Do you think dbt automatically saves all test failures for later analysis? Commit yes or no.
Common Belief:dbt automatically stores all test failures in a database table for you.
Tap to reveal reality
Reality:dbt only shows test failures during runs and does not save them permanently by default.
Why it matters:Assuming automatic storage leads to missing historical failure data and losing insights into recurring data issues.
Quick: Do you think storing every failed row from tests is always a good idea? Commit yes or no.
Common Belief:Storing every failed row from tests is always beneficial and has no downsides.
Tap to reveal reality
Reality:Storing all failed rows can cause large storage use and slow queries; selective or summarized storage is often better.
Why it matters:Ignoring storage costs and performance can make failure analysis impractical at scale.
Quick: Do you think you need to modify dbt core code to store test failures? Commit yes or no.
Common Belief:You must change dbt's internal code to save test failures for analysis.
Tap to reveal reality
Reality:You can store failures using dbt models, hooks, and SQL without modifying dbt itself.
Why it matters:Believing core changes are needed discourages teams from implementing practical failure storage solutions.
Expert Zone
1
Failure storage design must balance detail and storage cost; storing full failed rows is not always optimal.
2
Automating failure capture with post-hooks can introduce dependencies and timing issues if not carefully managed.
3
Analyzing failure trends requires consistent test naming and metadata to correlate failures across runs.
When NOT to use
Storing test failures is not ideal for very small projects with few tests or when real-time alerting is more critical than historical analysis. In such cases, lightweight monitoring tools or direct alerting systems may be better.
Production Patterns
Teams often build dedicated failure tables with partitioning by date for efficient querying. They automate failure capture with dbt post-hooks and integrate failure data into BI dashboards for data quality monitoring. Some use failure data to trigger automated tickets or notifications for faster resolution.
Connections
Data Observability
Builds-on
Storing test failures is a key part of data observability, which monitors data health continuously to prevent issues.
Incident Management
Similar pattern
Just like incident logs in IT operations track system failures for analysis, storing test failures tracks data issues for resolution.
Quality Control in Manufacturing
Analogous process
Recording defects over time in manufacturing helps improve processes, just as storing test failures helps improve data pipelines.
Common Pitfalls
#1Not storing failure details leads to loss of historical error context.
Wrong approach:Run dbt tests and only read failure messages in the console without saving them.
Correct approach:Create a failure storage table and insert failure details after each test run using dbt models or hooks.
Root cause:Assuming test failures are only relevant during the run and ignoring long-term analysis benefits.
#2Storing too much raw failure data causes storage bloat and slow queries.
Wrong approach:Insert every failed row with all columns into the failure table without filtering or summarizing.
Correct approach:Store only key columns or aggregate failure info to balance detail and performance.
Root cause:Not considering storage costs and query efficiency when designing failure storage.
#3Manually running failure storage steps leads to inconsistent data.
Wrong approach:Run separate SQL scripts manually after tests to save failures.
Correct approach:Automate failure storage with dbt post-hooks or models that run every time tests run.
Root cause:Underestimating the value of automation for consistent and reliable failure tracking.
Key Takeaways
Storing test failures transforms temporary error messages into valuable historical data for analysis.
You can use dbt models and hooks to capture and save failure details without changing dbt core.
Balancing detail and storage cost is crucial when designing failure storage solutions.
Automating failure capture ensures consistent data quality tracking and saves manual effort.
Analyzing stored failures helps identify recurring issues and prioritize data quality improvements.