0
0
dbtdata~15 mins

One model per source table rule in dbt - Deep Dive

Choose your learning style9 modes available
Overview - One model per source table rule
What is it?
The 'One model per source table' rule means creating a separate data model in dbt for each raw table you get from your data source. Instead of mixing data from many tables into one model, you keep them separate at first. This helps keep your data organized and easier to understand. Each model represents a clean, transformed version of one source table.
Why it matters
Without this rule, your data models can become confusing and hard to maintain because they mix many sources together. It becomes difficult to track where data comes from or fix errors. By having one model per source table, you get clear, simple building blocks that make your data pipeline easier to debug, update, and trust. This leads to better data quality and faster development.
Where it fits
Before learning this, you should understand basic dbt concepts like models, sources, and transformations. After mastering this rule, you can learn about combining models, building complex transformations, and optimizing your dbt project structure.
Mental Model
Core Idea
Each source table gets its own dedicated model to keep data transformations clear, modular, and easy to manage.
Think of it like...
It's like organizing your kitchen: you keep fruits in one basket, vegetables in another, and spices in a separate jar. This way, you always know where to find what you need and can prepare meals without mixing everything up.
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Source Table 1│──▶│ Model for Tab1 │
└───────────────┘   └───────────────┘

┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Source Table 2│──▶│ Model for Tab2 │
└───────────────┘   └───────────────┘

┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ Source Table 3│──▶│ Model for Tab3 │
└───────────────┘   └───────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding Source Tables in dbt
🤔
Concept: Learn what source tables are and how dbt connects to raw data.
Source tables are the raw data tables in your database that you want to analyze. In dbt, you declare these tables as 'sources' so you can reference them safely in your models. This helps track where data comes from and ensures your models use the right inputs.
Result
You can now reference raw tables in your dbt project with clear source definitions.
Knowing what source tables are is the first step to organizing your data transformations clearly.
2
FoundationWhat is a dbt Model?
🤔
Concept: A model is a SQL file in dbt that transforms data and creates a new table or view.
In dbt, each model is a SQL query saved as a file. When you run dbt, it runs these queries and creates tables or views in your database. Models let you clean, filter, and combine data step-by-step.
Result
You can create your first model that transforms data from a source table.
Understanding models as building blocks helps you see how data flows and changes in your project.
3
IntermediateWhy One Model per Source Table?
🤔Before reading on: Do you think combining multiple source tables into one model is easier or harder to maintain? Commit to your answer.
Concept: Separating each source table into its own model keeps transformations simple and traceable.
When you create one model per source table, each model focuses on cleaning and preparing just one table. This makes it easier to find errors, update logic, and understand data lineage. It also allows other models to build on these clean tables without confusion.
Result
Your dbt project becomes modular, with clear dependencies and easier debugging.
Understanding modularity in data models prevents tangled code and improves collaboration.
4
IntermediateImplementing One Model per Source Table
🤔
Concept: How to create separate models for each source table in dbt.
For each source table, create a SQL file in your models folder. Reference the source table using the source() function. Write SQL to clean or transform the data as needed. Run dbt to build these models as separate tables or views.
Result
You get multiple clean models, each representing one source table's data.
Knowing how to implement this rule in practice helps maintain a clean project structure.
5
IntermediateBuilding Complex Models from Base Models
🤔Before reading on: Do you think complex models should directly query raw tables or build on base models? Commit to your answer.
Concept: Use base models (one per source table) as building blocks for more complex transformations.
After creating base models, you can write new models that join or aggregate these base models. This keeps complex logic separate from raw data cleaning. It also makes your project easier to test and maintain.
Result
Complex models become easier to understand and update because they rely on clean, simple base models.
Recognizing the layered approach to modeling improves project scalability and clarity.
6
AdvancedManaging Dependencies and Performance
🤔Before reading on: Does having many small models slow down dbt runs or speed them up? Commit to your answer.
Concept: Understand how dbt manages model dependencies and how one model per source table affects performance.
dbt builds models in dependency order. Having one model per source table creates many small models, which can increase the number of steps but improves clarity. You can use incremental models or materializations to optimize performance. Also, dbt's DAG visualization helps track dependencies.
Result
You can balance clarity and performance by managing model materializations and dependencies.
Knowing how dbt executes models helps you design efficient and maintainable pipelines.
7
ExpertHandling Exceptions and Complex Sources
🤔Before reading on: Should you always strictly follow one model per source table, even for very small or complex tables? Commit to your answer.
Concept: Learn when and how to deviate from the rule for practical reasons.
Sometimes, very small or tightly related source tables can be combined into one model to reduce complexity. Also, some sources may require multiple models for different purposes (e.g., raw staging vs. cleaned data). Experts balance strict rules with project needs, documenting exceptions clearly.
Result
You gain flexibility to adapt the rule without losing project clarity.
Understanding when to bend rules prevents over-engineering and keeps projects practical.
Under the Hood
dbt reads your project files and builds a Directed Acyclic Graph (DAG) of models based on dependencies. Each model runs its SQL query to create or update tables/views in the database. The 'one model per source table' rule means each source table is wrapped in a single model, which becomes a node in the DAG. This modularity allows dbt to track lineage, run models in order, and rebuild only what changed.
Why designed this way?
This rule was created to improve clarity and maintainability in data projects. Early data pipelines often mixed many sources in one model, causing confusion and bugs. Separating models by source table aligns with software engineering principles like modularity and single responsibility, making data projects easier to test, debug, and evolve.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Source Table 1│──────▶│ Model for Tab1 │──────▶│ Downstream Mod│
│ Source Table 2│──────▶│ Model for Tab2 │──────▶│ Downstream Mod│
│ Source Table 3│──────▶│ Model for Tab3 │──────▶│ Downstream Mod│
└───────────────┘       └───────────────┘       └───────────────┘
Myth Busters - 3 Common Misconceptions
Quick: Do you think combining multiple source tables into one model is better for clarity? Commit yes or no.
Common Belief:Combining many source tables into one model is simpler and easier to manage.
Tap to reveal reality
Reality:Mixing multiple source tables in one model makes it harder to track data origins and debug issues.
Why it matters:This leads to tangled code and longer debugging times, reducing trust in data.
Quick: Do you think having many small models slows down dbt runs significantly? Commit yes or no.
Common Belief:More models always mean slower dbt runs and worse performance.
Tap to reveal reality
Reality:While more models add steps, dbt optimizes execution order and incremental builds can keep performance good.
Why it matters:Avoiding modular models for fear of speed can cause messy, unmaintainable projects.
Quick: Is it always wrong to combine small related source tables into one model? Commit yes or no.
Common Belief:You must never combine source tables; one model per source table is absolute.
Tap to reveal reality
Reality:Sometimes combining small or tightly related tables improves simplicity without losing clarity.
Why it matters:Rigidly following the rule without context can cause unnecessary complexity.
Expert Zone
1
Some source tables may require multiple models for different stages like raw staging and cleaned data, which still respects modularity.
2
Materialization strategies (table, view, incremental) affect how one model per source table impacts performance and storage.
3
Documenting exceptions to the rule clearly is crucial for team understanding and project longevity.
When NOT to use
Avoid strict one model per source table when source tables are extremely small and always used together; instead, combine them for simplicity. Also, for very complex transformations, build intermediate models that combine base models rather than mixing raw sources directly.
Production Patterns
In production, teams create base models per source table as staging layers, then build marts or business logic models on top. They use dbt's DAG visualization to manage dependencies and incremental models to optimize performance.
Connections
Modular Programming
The one model per source table rule applies the modular programming principle to data modeling.
Understanding modular programming helps grasp why separating data transformations into small, focused models improves maintainability.
Data Lineage
One model per source table clarifies data lineage by creating clear, traceable steps from raw data to final outputs.
Knowing this connection helps you appreciate how modular models make auditing and debugging data easier.
Supply Chain Management
Just like supply chains track parts from raw materials to finished products, one model per source table tracks data from raw tables to final reports.
Seeing data pipelines as supply chains highlights the importance of clear, modular steps for quality and traceability.
Common Pitfalls
#1Mixing multiple source tables in one model for convenience.
Wrong approach:SELECT * FROM source_table1 JOIN source_table2 ON ... -- all in one model
Correct approach:Create model_source_table1 AS (SELECT * FROM source_table1); Create model_source_table2 AS (SELECT * FROM source_table2); Then join these models in a separate model.
Root cause:Misunderstanding modularity and trying to shortcut by combining sources early.
#2Ignoring model dependencies and running models in wrong order.
Wrong approach:Running downstream models before base models are built, causing errors.
Correct approach:Use dbt run which respects dependencies or run models in dependency order manually.
Root cause:Not understanding dbt's DAG and dependency management.
#3Over-applying the rule and creating too many tiny models unnecessarily.
Wrong approach:Creating separate models for trivial tables that are always used together, causing clutter.
Correct approach:Combine very small, tightly related tables into one model when it simplifies the project.
Root cause:Rigidly following the rule without considering project context.
Key Takeaways
One model per source table keeps data transformations clear, modular, and easier to maintain.
This rule helps track data lineage and debug issues faster by isolating each source's logic.
Building complex models should happen on top of these base models, not by mixing raw sources directly.
Balancing strict adherence with practical exceptions leads to better project design.
Understanding dbt's dependency management and materializations is key to optimizing performance with this rule.