In dbt, models are organized in a Directed Acyclic Graph (DAG). What does it mean when a model depends on another model in this DAG?
Think about the order of execution when one model uses data from another.
In a DAG, a model that depends on another must run after the model it depends on to ensure data is available.
Given these dbt models with dependencies:
- model_a (no dependencies)
- model_b depends on model_a
- model_c depends on model_a
- model_d depends on model_b and model_c
How many nodes will the DAG contain after compilation?
Count each unique model as one node.
Each model is a node. Here, model_a, model_b, model_c, and model_d make 4 nodes.
Consider this simplified dbt project with models and their dependencies:
models: model_x.sql: no dependencies model_y.sql: depends on model_x model_z.sql: depends on model_y
If you run dbt run --models model_z, which models will be run and in what order?
dbt runs all dependencies of the specified model.
dbt runs dependencies first to ensure data is ready. So model_x runs, then model_y, then model_z.
You have these dbt models:
- model_1 depends on model_2
- model_2 depends on model_3
- model_3 depends on model_1
What error will dbt raise when compiling this DAG?
Think about what happens when dependencies loop back to the start.
A DAG cannot have cycles. dbt detects this and raises a cyclic dependency error.
You have a large dbt project with many models. You want to run only models affected by changes in model_sales and its downstream models. Which dbt command achieves this?
Use dbt's selector syntax to include downstream models.
The model_sales+ selector runs model_sales and all models that depend on it downstream.