0
0
dbtdata~15 mins

ref() function for model dependencies in dbt - Deep Dive

Choose your learning style9 modes available
Overview - ref() function for model dependencies
What is it?
The ref() function in dbt is a way to tell your project that one model depends on another. It creates a link between models so dbt knows the order to run them. This helps build complex data pipelines by managing dependencies automatically. It also makes your code easier to read and maintain.
Why it matters
Without ref(), you would have to manually manage the order of running models and write full table names everywhere. This would be error-prone and hard to update. ref() solves this by tracking dependencies and generating the correct SQL references. This means your data pipeline runs smoothly and changes in one model automatically update downstream models.
Where it fits
Before learning ref(), you should understand basic SQL and how dbt models work. After mastering ref(), you can learn about advanced dbt features like macros, snapshots, and testing. ref() is a foundational concept that connects your models and enables dbt's powerful dependency management.
Mental Model
Core Idea
ref() is a function that links one dbt model to another, telling dbt which models depend on each other and in what order to run them.
Think of it like...
Imagine building a LEGO castle where each piece must be placed in a specific order. ref() is like the instruction manual that tells you which piece to put next so the castle stands strong.
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  model_a    │────▶│  model_b    │────▶│  model_c    │
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                  ▲                  ▲
       │                  │                  │
    ref('model_a')     ref('model_b')     ref('model_c')
Build-Up - 6 Steps
1
FoundationUnderstanding dbt Models
🤔
Concept: Learn what a dbt model is and how it represents a SQL query that creates a table or view.
In dbt, a model is a SQL file that defines a dataset. When you run dbt, it runs these SQL queries to build tables or views in your database. Models are the building blocks of your data pipeline.
Result
You can create simple tables or views by writing SQL files in your dbt project.
Knowing what a model is helps you understand what ref() will link together.
2
FoundationBasic SQL References in dbt
🤔
Concept: Learn how to reference tables in SQL and why hardcoding table names can cause problems.
Normally, in SQL, you write SELECT * FROM schema.table. But if the table name or schema changes, you must update all queries manually. This is error-prone and hard to maintain.
Result
You see that hardcoded table names make your SQL fragile and hard to update.
Understanding this problem sets the stage for why ref() is needed.
3
IntermediateUsing ref() to Link Models
🤔Before reading on: do you think ref() returns a table name string or runs the SQL query directly? Commit to your answer.
Concept: ref() returns the correct table or view name for a model, allowing you to link models without hardcoding names.
In your SQL model, instead of writing SELECT * FROM schema.table, you write SELECT * FROM {{ ref('model_name') }}. This tells dbt to replace ref('model_name') with the actual table name at runtime.
Result
Your SQL becomes dynamic and automatically updates if the model's name or schema changes.
Understanding that ref() returns the correct table name helps you write flexible, maintainable SQL.
4
IntermediateHow ref() Manages Model Dependencies
🤔Before reading on: do you think dbt runs models in the order they appear in your project or based on ref() calls? Commit to your answer.
Concept: ref() tells dbt which models depend on others, so dbt runs them in the correct order automatically.
When dbt sees ref('model_a') inside model_b, it knows model_b depends on model_a. dbt builds a graph of these dependencies and runs models from the bottom up, ensuring data is ready when needed.
Result
Your data pipeline runs smoothly without manual ordering.
Knowing that ref() builds a dependency graph prevents errors from running models in the wrong order.
5
Advancedref() and Environment Awareness
🤔Before reading on: do you think ref() changes behavior when running in different environments like dev or prod? Commit to your answer.
Concept: ref() adapts table references based on the target environment, making your project portable across environments.
dbt uses ref() to generate table names that include environment-specific prefixes or schemas. For example, in dev, ref('model_a') might point to dev_schema.model_a, while in prod it points to prod_schema.model_a.
Result
You can run the same project in multiple environments without changing SQL code.
Understanding environment-aware references helps you build robust, multi-environment pipelines.
6
Expertref() in Complex Dependency Graphs
🤔Before reading on: do you think circular dependencies are allowed with ref()? Commit to your answer.
Concept: ref() builds a directed acyclic graph (DAG) of model dependencies and prevents circular references to ensure reliable builds.
dbt analyzes all ref() calls to create a DAG. If a circular dependency is detected (model_a refs model_b and model_b refs model_a), dbt throws an error. This ensures your pipeline has a clear, unambiguous order.
Result
Your project structure remains clean and error-free, even with many models.
Knowing that ref() enforces acyclic dependencies helps you design scalable, maintainable data pipelines.
Under the Hood
ref() is a Jinja macro that, during compilation, replaces the call with the fully qualified table name of the referenced model. dbt builds a dependency graph by parsing all ref() calls across models, then orders model execution accordingly. This graph is a directed acyclic graph ensuring no circular dependencies. At runtime, ref() adapts to the target environment's schema and database settings.
Why designed this way?
ref() was designed to solve the problem of managing complex dependencies in SQL-based data pipelines. Hardcoding table names was error-prone and inflexible. By using ref(), dbt can automate dependency tracking, environment management, and model ordering, making pipelines more reliable and easier to maintain.
┌───────────────┐
│  Model Files  │
└──────┬────────┘
       │ parse ref() calls
       ▼
┌─────────────────────┐
│ Dependency Graph    │
│ (DAG of models)     │
└──────┬──────────────┘
       │ topological sort
       ▼
┌─────────────────────┐
│ Ordered Model Runs  │
└──────┬──────────────┘
       │ during compilation
       ▼
┌─────────────────────┐
│ SQL with replaced   │
│ ref() calls         │
└─────────────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Does ref() execute the referenced model's SQL immediately when called? Commit to yes or no.
Common Belief:ref() runs the referenced model's SQL query immediately and returns its data.
Tap to reveal reality
Reality:ref() only returns the table name of the referenced model; it does not execute SQL at call time. dbt runs models in the correct order later.
Why it matters:Thinking ref() runs SQL immediately leads to confusion about execution order and can cause incorrect assumptions about data availability.
Quick: Can you use ref() to reference models outside your current dbt project? Commit to yes or no.
Common Belief:ref() can reference any table in the database, even if it's not part of the dbt project.
Tap to reveal reality
Reality:ref() only works with models defined inside the current dbt project. For external tables, you must use full table names or sources.
Why it matters:Misusing ref() for external tables causes compilation errors and breaks the dependency graph.
Quick: Does ref() automatically handle schema changes in referenced models? Commit to yes or no.
Common Belief:ref() automatically updates your SQL if the schema or columns of the referenced model change.
Tap to reveal reality
Reality:ref() only manages table names and dependencies, not schema or column changes. You must update your SQL logic accordingly.
Why it matters:Assuming ref() handles schema changes can cause runtime errors if columns are missing or renamed.
Expert Zone
1
ref() calls are resolved during compilation, not runtime, which means dynamic SQL generation depends on the compilation context.
2
Using ref() inside macros or hooks requires careful handling because the context may differ, affecting how dependencies are tracked.
3
ref() supports cross-project references when using dbt packages, but this requires explicit configuration and understanding of package namespaces.
When NOT to use
ref() should not be used to reference external tables or views not managed by dbt; instead, use source() for external data. Also, avoid using ref() in raw SQL outside dbt models, as it requires compilation context.
Production Patterns
In production, teams use ref() to build modular, reusable models that form a clear dependency graph. This enables incremental builds, testing, and documentation generation. Complex projects often combine ref() with source() and macros to manage both internal and external data dependencies.
Connections
Directed Acyclic Graph (DAG)
ref() builds a DAG of model dependencies, similar to how DAGs represent workflows in other fields.
Understanding DAGs from project management or computer science helps grasp how dbt orders model runs without cycles.
Makefile Dependency Management
ref() functions like dependencies in a Makefile, where targets depend on other files to build in order.
Knowing how Makefiles track dependencies clarifies how ref() automates build order in data pipelines.
Software Package Imports
ref() is like importing modules in programming, where one module depends on another to function.
Seeing ref() as an import mechanism helps understand modularity and dependency resolution in dbt projects.
Common Pitfalls
#1Referencing models with hardcoded table names instead of ref()
Wrong approach:SELECT * FROM analytics.sales_data;
Correct approach:SELECT * FROM {{ ref('sales_data') }};
Root cause:Not understanding that hardcoding table names breaks dependency tracking and environment flexibility.
#2Using ref() to reference tables outside the dbt project
Wrong approach:SELECT * FROM {{ ref('external_table') }};
Correct approach:SELECT * FROM {{ source('external_schema', 'external_table') }};
Root cause:Confusing ref() with source(), leading to compilation errors.
#3Creating circular dependencies with ref() calls
Wrong approach:model_a.sql contains {{ ref('model_b') }} and model_b.sql contains {{ ref('model_a') }}
Correct approach:Refactor models to remove circular references, e.g., combine logic or create intermediate models.
Root cause:Not realizing ref() builds a DAG that cannot have cycles.
Key Takeaways
ref() is the core function in dbt that links models and manages dependencies automatically.
Using ref() instead of hardcoded table names makes your SQL flexible and environment-aware.
ref() builds a directed acyclic graph of models, ensuring correct build order and preventing circular dependencies.
ref() only returns table names during compilation and does not execute SQL immediately.
Understanding ref() is essential for building scalable, maintainable, and robust data pipelines with dbt.