0
0
dbtdata~20 mins

Store test failures for analysis in dbt - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Test Failure Analysis Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this SQL query storing test failures?

Consider a dbt test that checks for null values in a column user_id. The following SQL stores failures in a table. What rows will be stored?

dbt
INSERT INTO test_failures (test_name, failed_row_id)
SELECT 'not_null_user_id', id
FROM users
WHERE user_id IS NULL;
ANo rows inserted because <code>id</code> is not selected.
BRows where <code>user_id</code> is NOT NULL, with test name 'not_null_user_id'.
CRows from <code>users</code> where <code>user_id</code> is NULL, with test name 'not_null_user_id'.
DAll rows from <code>users</code> regardless of <code>user_id</code> value.
Attempts:
2 left
💡 Hint

Think about the WHERE clause filtering rows with user_id IS NULL.

data_output
intermediate
1:30remaining
How many rows will be stored after this test failure query?

Given the table orders with 1000 rows, 50 have order_date NULL. The test stores failures with:

INSERT INTO failure_log (test_name, order_id)
SELECT 'not_null_order_date', order_id
FROM orders
WHERE order_date IS NULL;

How many rows will failure_log have after this?

A1000 rows, all orders regardless of <code>order_date</code>.
B50 rows, one for each order with NULL <code>order_date</code>.
C0 rows, because <code>order_date</code> is not selected.
D950 rows, orders with non-NULL <code>order_date</code>.
Attempts:
2 left
💡 Hint

Count rows where order_date IS NULL.

🔧 Debug
advanced
2:30remaining
Why does this query fail to store test failures correctly?

Look at this SQL snippet intended to store failures where email is NULL:

INSERT INTO test_failures (test_name, failed_row_id)
SELECT 'not_null_email', user_id
FROM users
WHERE email = NULL;

Why does it not store any rows?

ABecause the <code>SELECT</code> statement is missing a <code>GROUP BY</code> clause.
BBecause <code>user_id</code> is not a valid column in <code>users</code>.
CBecause the table <code>test_failures</code> does not exist.
DBecause <code>email = NULL</code> always returns false; use <code>IS NULL</code> instead.
Attempts:
2 left
💡 Hint

Remember how SQL compares NULL values.

🚀 Application
advanced
3:00remaining
Which SQL query correctly stores failures for a uniqueness test on customer_id?

You want to store rows where customer_id is duplicated in sales. Which query does this correctly?

A
INSERT INTO failure_log (test_name, failed_customer_id)
SELECT 'unique_customer_id', customer_id
FROM sales
GROUP BY customer_id
HAVING COUNT(*) &gt; 1;
B
INSERT INTO failure_log (test_name, failed_customer_id)
SELECT 'unique_customer_id', customer_id
FROM sales
WHERE customer_id IS NULL;
C
INSERT INTO failure_log (test_name, failed_customer_id)
SELECT 'unique_customer_id', customer_id
FROM sales
WHERE COUNT(customer_id) &gt; 1;
D
INSERT INTO failure_log (test_name, failed_customer_id)
SELECT 'unique_customer_id', customer_id
FROM sales
GROUP BY customer_id
HAVING COUNT(customer_id) = 1;
Attempts:
2 left
💡 Hint

Think about how to find duplicates using GROUP BY and HAVING.

🧠 Conceptual
expert
2:00remaining
What is the main benefit of storing test failures in a separate table for analysis?

Why do data teams store test failures in dedicated tables instead of just logging errors?

AIt allows easy querying and tracking of failing data rows over time for root cause analysis.
BIt reduces the size of the main data tables by removing failed rows permanently.
CIt automatically fixes the data errors without manual intervention.
DIt prevents the data pipeline from running if any test fails.
Attempts:
2 left
💡 Hint

Think about how stored failures help in understanding data quality issues.