Given a table clustered by customer_id, what is the expected effect on query performance when filtering by customer_id?
select * from sales where customer_id = 12345;
Clustering keys group data physically by the key, reducing scan scope.
Clustering keys organize data so queries filtering on those keys scan fewer partitions, improving speed.
What will be the number of rows in the target table after running this incremental model if the source has 1000 rows and 200 rows have changed?
incremental model with unique_key='id' that updates changed rows only
Incremental models update existing rows based on unique keys.
The incremental model updates 200 changed rows but keeps total rows at 1000.
A dbt model runs slowly despite using clustering keys. Which option explains the likely cause?
select * from orders where order_date between '2023-01-01' and '2023-01-31'; -- clustered on customer_id
Clustering helps only when filtering on the clustered columns.
Clustering on customer_id does not speed up queries filtering on order_date, so the query scans many partitions.
You have a large sales fact table with billions of rows. Which partitioning strategy is best to optimize queries filtering by date and region?
Partitioning works best on low-cardinality columns used in filters.
Partitioning by date reduces data scanned by date filters; clustering by region further optimizes region filters.
How do materialized views affect compute costs in a cloud data warehouse?
Think about how precomputed data affects query execution.
Materialized views store precomputed results, so queries run faster and use less compute, lowering costs.