Bird
Raised Fist0
dbtdata~15 mins

ref() function for model dependencies in dbt - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - ref() function for model dependencies
What is it?
The ref() function in dbt is a way to tell your project that one model depends on another. It creates a link between models so dbt knows the order to run them. This helps build complex data pipelines by managing dependencies automatically. It also makes your code easier to read and maintain.
Why it matters
Without ref(), you would have to manually manage the order of running models and write full table names everywhere. This would be error-prone and hard to update. ref() solves this by tracking dependencies and generating the correct SQL references. This means your data pipeline runs smoothly and changes in one model automatically update downstream models.
Where it fits
Before learning ref(), you should understand basic SQL and how dbt models work. After mastering ref(), you can learn about advanced dbt features like macros, snapshots, and testing. ref() is a foundational concept that connects your models and enables dbt's powerful dependency management.
Mental Model
Core Idea
ref() is a function that links one dbt model to another, telling dbt which models depend on each other and in what order to run them.
Think of it like...
Imagine building a LEGO castle where each piece must be placed in a specific order. ref() is like the instruction manual that tells you which piece to put next so the castle stands strong.
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  model_a    │────▶│  model_b    │────▶│  model_c    │
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                  ▲                  ▲
       │                  │                  │
    ref('model_a')     ref('model_b')     ref('model_c')
Build-Up - 6 Steps
1
FoundationUnderstanding dbt Models
🤔
Concept: Learn what a dbt model is and how it represents a SQL query that creates a table or view.
In dbt, a model is a SQL file that defines a dataset. When you run dbt, it runs these SQL queries to build tables or views in your database. Models are the building blocks of your data pipeline.
Result
You can create simple tables or views by writing SQL files in your dbt project.
Knowing what a model is helps you understand what ref() will link together.
2
FoundationBasic SQL References in dbt
🤔
Concept: Learn how to reference tables in SQL and why hardcoding table names can cause problems.
Normally, in SQL, you write SELECT * FROM schema.table. But if the table name or schema changes, you must update all queries manually. This is error-prone and hard to maintain.
Result
You see that hardcoded table names make your SQL fragile and hard to update.
Understanding this problem sets the stage for why ref() is needed.
3
IntermediateUsing ref() to Link Models
🤔Before reading on: do you think ref() returns a table name string or runs the SQL query directly? Commit to your answer.
Concept: ref() returns the correct table or view name for a model, allowing you to link models without hardcoding names.
In your SQL model, instead of writing SELECT * FROM schema.table, you write SELECT * FROM {{ ref('model_name') }}. This tells dbt to replace ref('model_name') with the actual table name at runtime.
Result
Your SQL becomes dynamic and automatically updates if the model's name or schema changes.
Understanding that ref() returns the correct table name helps you write flexible, maintainable SQL.
4
IntermediateHow ref() Manages Model Dependencies
🤔Before reading on: do you think dbt runs models in the order they appear in your project or based on ref() calls? Commit to your answer.
Concept: ref() tells dbt which models depend on others, so dbt runs them in the correct order automatically.
When dbt sees ref('model_a') inside model_b, it knows model_b depends on model_a. dbt builds a graph of these dependencies and runs models from the bottom up, ensuring data is ready when needed.
Result
Your data pipeline runs smoothly without manual ordering.
Knowing that ref() builds a dependency graph prevents errors from running models in the wrong order.
5
Advancedref() and Environment Awareness
🤔Before reading on: do you think ref() changes behavior when running in different environments like dev or prod? Commit to your answer.
Concept: ref() adapts table references based on the target environment, making your project portable across environments.
dbt uses ref() to generate table names that include environment-specific prefixes or schemas. For example, in dev, ref('model_a') might point to dev_schema.model_a, while in prod it points to prod_schema.model_a.
Result
You can run the same project in multiple environments without changing SQL code.
Understanding environment-aware references helps you build robust, multi-environment pipelines.
6
Expertref() in Complex Dependency Graphs
🤔Before reading on: do you think circular dependencies are allowed with ref()? Commit to your answer.
Concept: ref() builds a directed acyclic graph (DAG) of model dependencies and prevents circular references to ensure reliable builds.
dbt analyzes all ref() calls to create a DAG. If a circular dependency is detected (model_a refs model_b and model_b refs model_a), dbt throws an error. This ensures your pipeline has a clear, unambiguous order.
Result
Your project structure remains clean and error-free, even with many models.
Knowing that ref() enforces acyclic dependencies helps you design scalable, maintainable data pipelines.
Under the Hood
ref() is a Jinja macro that, during compilation, replaces the call with the fully qualified table name of the referenced model. dbt builds a dependency graph by parsing all ref() calls across models, then orders model execution accordingly. This graph is a directed acyclic graph ensuring no circular dependencies. At runtime, ref() adapts to the target environment's schema and database settings.
Why designed this way?
ref() was designed to solve the problem of managing complex dependencies in SQL-based data pipelines. Hardcoding table names was error-prone and inflexible. By using ref(), dbt can automate dependency tracking, environment management, and model ordering, making pipelines more reliable and easier to maintain.
┌───────────────┐
│  Model Files  │
└──────┬────────┘
       │ parse ref() calls
       ▼
┌─────────────────────┐
│ Dependency Graph    │
│ (DAG of models)     │
└──────┬──────────────┘
       │ topological sort
       ▼
┌─────────────────────┐
│ Ordered Model Runs  │
└──────┬──────────────┘
       │ during compilation
       ▼
┌─────────────────────┐
│ SQL with replaced   │
│ ref() calls         │
└─────────────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Does ref() execute the referenced model's SQL immediately when called? Commit to yes or no.
Common Belief:ref() runs the referenced model's SQL query immediately and returns its data.
Tap to reveal reality
Reality:ref() only returns the table name of the referenced model; it does not execute SQL at call time. dbt runs models in the correct order later.
Why it matters:Thinking ref() runs SQL immediately leads to confusion about execution order and can cause incorrect assumptions about data availability.
Quick: Can you use ref() to reference models outside your current dbt project? Commit to yes or no.
Common Belief:ref() can reference any table in the database, even if it's not part of the dbt project.
Tap to reveal reality
Reality:ref() only works with models defined inside the current dbt project. For external tables, you must use full table names or sources.
Why it matters:Misusing ref() for external tables causes compilation errors and breaks the dependency graph.
Quick: Does ref() automatically handle schema changes in referenced models? Commit to yes or no.
Common Belief:ref() automatically updates your SQL if the schema or columns of the referenced model change.
Tap to reveal reality
Reality:ref() only manages table names and dependencies, not schema or column changes. You must update your SQL logic accordingly.
Why it matters:Assuming ref() handles schema changes can cause runtime errors if columns are missing or renamed.
Expert Zone
1
ref() calls are resolved during compilation, not runtime, which means dynamic SQL generation depends on the compilation context.
2
Using ref() inside macros or hooks requires careful handling because the context may differ, affecting how dependencies are tracked.
3
ref() supports cross-project references when using dbt packages, but this requires explicit configuration and understanding of package namespaces.
When NOT to use
ref() should not be used to reference external tables or views not managed by dbt; instead, use source() for external data. Also, avoid using ref() in raw SQL outside dbt models, as it requires compilation context.
Production Patterns
In production, teams use ref() to build modular, reusable models that form a clear dependency graph. This enables incremental builds, testing, and documentation generation. Complex projects often combine ref() with source() and macros to manage both internal and external data dependencies.
Connections
Directed Acyclic Graph (DAG)
ref() builds a DAG of model dependencies, similar to how DAGs represent workflows in other fields.
Understanding DAGs from project management or computer science helps grasp how dbt orders model runs without cycles.
Makefile Dependency Management
ref() functions like dependencies in a Makefile, where targets depend on other files to build in order.
Knowing how Makefiles track dependencies clarifies how ref() automates build order in data pipelines.
Software Package Imports
ref() is like importing modules in programming, where one module depends on another to function.
Seeing ref() as an import mechanism helps understand modularity and dependency resolution in dbt projects.
Common Pitfalls
#1Referencing models with hardcoded table names instead of ref()
Wrong approach:SELECT * FROM analytics.sales_data;
Correct approach:SELECT * FROM {{ ref('sales_data') }};
Root cause:Not understanding that hardcoding table names breaks dependency tracking and environment flexibility.
#2Using ref() to reference tables outside the dbt project
Wrong approach:SELECT * FROM {{ ref('external_table') }};
Correct approach:SELECT * FROM {{ source('external_schema', 'external_table') }};
Root cause:Confusing ref() with source(), leading to compilation errors.
#3Creating circular dependencies with ref() calls
Wrong approach:model_a.sql contains {{ ref('model_b') }} and model_b.sql contains {{ ref('model_a') }}
Correct approach:Refactor models to remove circular references, e.g., combine logic or create intermediate models.
Root cause:Not realizing ref() builds a DAG that cannot have cycles.
Key Takeaways
ref() is the core function in dbt that links models and manages dependencies automatically.
Using ref() instead of hardcoded table names makes your SQL flexible and environment-aware.
ref() builds a directed acyclic graph of models, ensuring correct build order and preventing circular dependencies.
ref() only returns table names during compilation and does not execute SQL immediately.
Understanding ref() is essential for building scalable, maintainable, and robust data pipelines with dbt.

Practice

(1/5)
1. What is the main purpose of the ref() function in dbt?
easy
A. To create new database users
B. To write raw SQL queries inside dbt models
C. To link models and define dependencies between them
D. To schedule dbt runs automatically

Solution

  1. Step 1: Understand the role of ref()

    The ref() function is used to link one model to another in dbt, so dbt knows the order to run models and their dependencies.
  2. Step 2: Identify what ref() does not do

    It does not write raw SQL, create users, or schedule runs. Its main role is linking models.
  3. Final Answer:

    To link models and define dependencies between them -> Option C
  4. Quick Check:

    ref() links models = A [OK]
Hint: Remember: ref() connects models, not SQL or users [OK]
Common Mistakes:
  • Thinking ref() writes SQL code
  • Confusing ref() with scheduling tools
  • Assuming ref() manages database users
2. Which of the following is the correct syntax to reference a model named customers inside another model using ref()?
easy
A. select * from {{ ref('customers') }}
B. select * from ref('customers')
C. select * from ref(customers)
D. select * from {{ ref(customers) }}

Solution

  1. Step 1: Recall dbt Jinja syntax for ref()

    In dbt, ref() must be wrapped in double curly braces and the model name must be a string in quotes.
  2. Step 2: Check each option

    select * from {{ ref('customers') }} uses {{ ref('customers') }} which is correct. Options B and C miss the curly braces or quotes. select * from {{ ref(customers) }} misses quotes around the model name.
  3. Final Answer:

    select * from {{ ref('customers') }} -> Option A
  4. Quick Check:

    Use {{ ref('model_name') }} syntax = A [OK]
Hint: Always use {{ ref('model_name') }} with quotes and braces [OK]
Common Mistakes:
  • Omitting curly braces {{ }}
  • Not putting model name in quotes
  • Using ref() without Jinja syntax
3. Given the following dbt model code, what will be the output SQL after compilation if the orders model exists?
select order_id, customer_id
from {{ ref('orders') }}
medium
A. select order_id, customer_id from orders
B. select order_id, customer_id from {{ ref('orders') }}
C. select order_id, customer_id from dbt.orders
D. select order_id, customer_id from ref('orders')

Solution

  1. Step 1: Understand what ref() compiles to

    The ref() function compiles to the actual table name of the referenced model, usually just the model name like 'orders'.
  2. Step 2: Check the compiled SQL output

    The compiled SQL replaces {{ ref('orders') }} with orders, so the output is select order_id, customer_id from orders.
  3. Final Answer:

    select order_id, customer_id from orders -> Option A
  4. Quick Check:

    ref('orders') compiles to orders = C [OK]
Hint: ref() compiles to the model's table name without braces [OK]
Common Mistakes:
  • Leaving ref() uncompiled in SQL
  • Adding extra schema prefix without config
  • Using ref() as a string literal
4. You wrote this dbt model code:
select * from ref('sales')

When you run dbt, you get an error. What is the problem?
medium
A. Quotes around 'sales' should be removed
B. Model name 'sales' does not exist
C. ref() cannot be used inside select statements
D. Missing double curly braces around ref()

Solution

  1. Step 1: Check the syntax of ref() usage

    In dbt, ref() must be wrapped in double curly braces to be interpreted as Jinja code.
  2. Step 2: Identify the error cause

    The code uses ref('sales') without {{ }}, so dbt treats it as plain text, causing an error.
  3. Final Answer:

    Missing double curly braces around ref() -> Option D
  4. Quick Check:

    Use {{ ref('model') }} not ref('model') alone = D [OK]
Hint: Always wrap ref() in {{ }} to avoid errors [OK]
Common Mistakes:
  • Forgetting {{ }} around ref()
  • Assuming ref() works without Jinja
  • Removing quotes from model name
5. You have two models: customers and orders. You want to create a new model customer_orders that joins these two. Which is the best way to use ref() to ensure correct dependencies and flexible naming?
hard
A. select c.customer_id, o.order_id from customers c join orders o on c.customer_id = o.customer_id
B. select c.customer_id, o.order_id from {{ ref('customers') }} c join {{ ref('orders') }} o on c.customer_id = o.customer_id
C. select c.customer_id, o.order_id from 'customers' c join 'orders' o on c.customer_id = o.customer_id
D. select c.customer_id, o.order_id from ref('customers') c join ref('orders') o on c.customer_id = o.customer_id

Solution

  1. Step 1: Use ref() with correct Jinja syntax for both models

    To link models and ensure dbt knows dependencies, use {{ ref('model_name') }} for both customers and orders.
  2. Step 2: Avoid hardcoding table names or missing Jinja syntax

    Options A and C hardcode names or use quotes incorrectly. select c.customer_id, o.order_id from ref('customers') c join ref('orders') o on c.customer_id = o.customer_id misses curly braces, so it won't compile.
  3. Final Answer:

    select c.customer_id, o.order_id from {{ ref('customers') }} c join {{ ref('orders') }} o on c.customer_id = o.customer_id -> Option B
  4. Quick Check:

    Use {{ ref('model') }} for all dependencies = B [OK]
Hint: Use {{ ref('model') }} for all model references [OK]
Common Mistakes:
  • Hardcoding table names instead of using ref()
  • Forgetting curly braces around ref()
  • Using quotes incorrectly around model names