Overview - Store test failures for analysis

What is it?

Storing test failures for analysis means saving information about when and why data tests fail in dbt projects. This helps teams understand data quality issues by keeping a record of errors instead of just seeing them once. It involves capturing details like which test failed, when, and what data caused the failure. This makes it easier to track problems over time and fix them effectively.

Why it matters

Without storing test failures, teams only see errors temporarily and lose valuable context about recurring data problems. This slows down debugging and can lead to bad decisions based on faulty data. By saving failure details, teams can analyze patterns, prioritize fixes, and improve data trustworthiness. It turns testing from a one-time check into a continuous learning process that supports better data-driven decisions.

Where it fits

Before this, learners should understand basic dbt testing concepts and how to write tests. After this, they can explore advanced monitoring, alerting, and automated data quality dashboards that use stored failure data to provide insights.

Mental Model

Core Idea

Storing test failures captures detailed error information over time to enable deeper analysis and continuous data quality improvement.

Think of it like...

It's like keeping a diary of all your car's warning lights and issues instead of just fixing them once and forgetting. Over time, the diary helps you spot patterns and prevent bigger breakdowns.

┌───────────────────────────────┐
│       dbt Test Runs           │
├───────────────┬───────────────┤
│ Test Result   │ Failure Info  │
├───────────────┼───────────────┤
│ Pass          │ -             │
│ Fail          │ Store details │
│               │ (time, data)  │
└───────────────┴───────────────┘
          ↓
┌───────────────────────────────┐
│   Failure Storage Table        │
│ - Test name                   │
│ - Timestamp                   │
│ - Failed rows or error info   │
└───────────────────────────────┘
          ↓
┌───────────────────────────────┐
│   Analysis & Reporting         │
│ - Identify recurring issues   │
│ - Prioritize fixes             │
│ - Improve data quality         │
└───────────────────────────────┘

Build-Up - 6 Steps

1

FoundationUnderstanding dbt Tests Basics

Concept: Learn what dbt tests are and how they check data quality.

dbt tests are simple checks you write to make sure your data meets expectations. For example, you can test if a column has no nulls or if values are unique. When you run dbt, it tells you which tests pass or fail.

Result

You know how to write and run basic dbt tests that check your data.

Understanding tests is the first step to improving data quality because you need to know what checks exist before analyzing failures.

2

FoundationWhat Happens When Tests Fail

3

IntermediateCreating a Table to Store Failures

4

IntermediateCapturing Failure Details with Custom SQL

5

AdvancedAutomating Failure Storage in dbt Runs

6

ExpertAnalyzing Failure Trends for Data Quality

Under the Hood

When dbt runs tests, it executes SQL queries that check data conditions. If a test fails, dbt captures the failing rows or error messages temporarily in memory or logs. By creating models or hooks that run after tests, you can query these failure outputs and insert them into a dedicated table in your data warehouse. This table accumulates failure records over time, indexed by test name and timestamp, allowing historical analysis.

Why designed this way?

dbt was designed to keep tests simple and focused on immediate feedback, not long-term failure storage. This keeps core dbt lightweight and flexible. Storing failures separately lets teams customize what and how much info to keep based on their needs. Alternatives like built-in failure logs would add complexity and storage overhead for all users, so the current design balances simplicity with extensibility.

┌───────────────┐       ┌───────────────┐       ┌─────────────────────┐
│ dbt Test Run  │──────▶│ Test Execution│──────▶│ Failure Output (temp)│
└───────────────┘       └───────────────┘       └─────────┬───────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────────┐
                                              │ Failure Storage Table│
                                              │ (persistent in DW)  │
                                              └─────────┬───────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────────┐
                                              │ Analysis & Reporting │
                                              └─────────────────────┘

Myth Busters - 3 Common Misconceptions

Quick: Do you think dbt automatically saves all test failures for later analysis? Commit yes or no.

Common Belief:dbt automatically stores all test failures in a database table for you.

Tap to reveal reality

Quick: Do you think storing every failed row from tests is always a good idea? Commit yes or no.

Common Belief:Storing every failed row from tests is always beneficial and has no downsides.

Tap to reveal reality

Quick: Do you think you need to modify dbt core code to store test failures? Commit yes or no.

Common Belief:You must change dbt's internal code to save test failures for analysis.

Tap to reveal reality

Expert Zone

1

Failure storage design must balance detail and storage cost; storing full failed rows is not always optimal.

2

Automating failure capture with post-hooks can introduce dependencies and timing issues if not carefully managed.

3

Analyzing failure trends requires consistent test naming and metadata to correlate failures across runs.

When NOT to use

Storing test failures is not ideal for very small projects with few tests or when real-time alerting is more critical than historical analysis. In such cases, lightweight monitoring tools or direct alerting systems may be better.

Production Patterns

Teams often build dedicated failure tables with partitioning by date for efficient querying. They automate failure capture with dbt post-hooks and integrate failure data into BI dashboards for data quality monitoring. Some use failure data to trigger automated tickets or notifications for faster resolution.

Connections

Data Observability

Builds-on

Storing test failures is a key part of data observability, which monitors data health continuously to prevent issues.

Incident Management

Similar pattern

Just like incident logs in IT operations track system failures for analysis, storing test failures tracks data issues for resolution.

Quality Control in Manufacturing

Analogous process

Recording defects over time in manufacturing helps improve processes, just as storing test failures helps improve data pipelines.

Common Pitfalls

#1Not storing failure details leads to loss of historical error context.

Wrong approach:Run dbt tests and only read failure messages in the console without saving them.

Correct approach:Create a failure storage table and insert failure details after each test run using dbt models or hooks.

Root cause:Assuming test failures are only relevant during the run and ignoring long-term analysis benefits.

#2Storing too much raw failure data causes storage bloat and slow queries.

Wrong approach:Insert every failed row with all columns into the failure table without filtering or summarizing.

Correct approach:Store only key columns or aggregate failure info to balance detail and performance.

Root cause:Not considering storage costs and query efficiency when designing failure storage.

#3Manually running failure storage steps leads to inconsistent data.

Wrong approach:Run separate SQL scripts manually after tests to save failures.

Correct approach:Automate failure storage with dbt post-hooks or models that run every time tests run.

Root cause:Underestimating the value of automation for consistent and reliable failure tracking.

Key Takeaways

Storing test failures transforms temporary error messages into valuable historical data for analysis.

You can use dbt models and hooks to capture and save failure details without changing dbt core.

Balancing detail and storage cost is crucial when designing failure storage solutions.

Automating failure capture ensures consistent data quality tracking and saves manual effort.

Analyzing stored failures helps identify recurring issues and prioritize data quality improvements.