0
0
dbtdata~15 mins

dbt project structure - Deep Dive

Choose your learning style9 modes available
Overview - dbt project structure
What is it?
A dbt project structure is the organized way files and folders are arranged to build, test, and document data transformations using dbt. It includes folders for models, tests, macros, and configurations that work together to create a clear, maintainable data pipeline. This structure helps teams collaborate and ensures data workflows are easy to understand and update. It acts like a blueprint for how dbt runs and manages your data transformations.
Why it matters
Without a clear dbt project structure, data transformations become messy and hard to manage, leading to errors and confusion. A well-organized structure saves time, reduces mistakes, and makes it easier for teams to work together on data projects. It also helps ensure data quality and consistency, which is critical for making reliable business decisions. Imagine trying to build a house without a blueprint; the project would be chaotic and inefficient.
Where it fits
Before learning dbt project structure, you should understand basic SQL and the concept of data transformation. After mastering the structure, you can learn advanced dbt features like hooks, packages, and deployment automation. This topic fits early in the dbt learning path, right after setting up dbt and before building complex models and tests.
Mental Model
Core Idea
A dbt project structure is like a well-organized kitchen where every tool and ingredient has its place, making cooking (data transformation) efficient and error-free.
Think of it like...
Think of a dbt project structure as a kitchen layout: the stove is where cooking happens (models folder), the pantry stores ingredients (data sources), the recipe book holds instructions (macros), and the cleaning supplies (tests) ensure everything stays clean and safe. If these are scattered randomly, cooking becomes slow and mistakes happen.
┌─────────────────────────────┐
│         dbt Project          │
├──────────────┬──────────────┤
│ models/      │ SQL files    │
│              │ (transform)  │
├──────────────┼──────────────┤
│ tests/       │ Data checks  │
├──────────────┼──────────────┤
│ macros/      │ Reusable     │
│              │ SQL snippets │
├──────────────┼──────────────┤
│ snapshots/   │ Data version │
│              │ history      │
├──────────────┼──────────────┤
│ seeds/       │ Raw data     │
├──────────────┼──────────────┤
│ dbt_project.yml │ Config file│
└─────────────────────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding dbt Project Basics
🤔
Concept: Learn what a dbt project is and its main components.
A dbt project is a folder containing all files needed to build your data transformations. The main file is dbt_project.yml, which tells dbt how to run your project. Inside the project, you have folders like models for SQL files that transform data, and tests to check data quality.
Result
You can identify the key files and folders in a dbt project and understand their basic roles.
Knowing the basic layout helps you navigate and organize your work efficiently from the start.
2
FoundationRole of the Models Folder
🤔
Concept: Models folder contains SQL files that define data transformations.
Inside the models folder, each SQL file represents a transformation step. When dbt runs, it compiles these files into SQL queries that create tables or views in your data warehouse. Organizing models into subfolders helps keep related transformations together.
Result
You understand how to write and organize transformation SQL files in dbt.
Recognizing models as the core transformation logic clarifies how dbt builds your data pipeline.
3
IntermediateUsing Tests and Seeds Folders
🤔Before reading on: do you think tests only check data after transformations or can they also check raw data? Commit to your answer.
Concept: Tests check data quality; seeds provide raw data inputs.
The tests folder contains files that define checks on your data, like ensuring no nulls or duplicates. Seeds are CSV files stored in the seeds folder that dbt can load into your warehouse as raw data tables. This helps when you want to include static reference data in your transformations.
Result
You can add data quality checks and load static data into your dbt project.
Understanding tests and seeds extends your control over data quality and input, making your pipeline more reliable.
4
IntermediateMacros Folder and Reusable SQL
🤔Before reading on: do you think macros are just comments or do they actually run code? Commit to your answer.
Concept: Macros are reusable SQL snippets that simplify complex logic.
Macros live in the macros folder and are like functions in programming. They let you write SQL code once and reuse it across models. This reduces repetition and makes your project easier to maintain and update.
Result
You can create and use macros to write cleaner, DRY (Don't Repeat Yourself) SQL code.
Knowing how to use macros helps you write scalable and maintainable dbt projects.
5
AdvancedConfiguring dbt_project.yml
🤔Before reading on: do you think dbt_project.yml only sets folder names or can it control model behavior too? Commit to your answer.
Concept: dbt_project.yml configures project settings and model behavior.
This YAML file defines project-wide settings like model materializations (table, view), folder paths, and version control. You can also set configurations per model or folder here, controlling how dbt builds each part of your project.
Result
You can customize how dbt runs your project and manages models.
Mastering dbt_project.yml lets you tailor your project to fit specific needs and optimize performance.
6
ExpertAdvanced Folder Structures and Modularization
🤔Before reading on: do you think deeply nested folders improve or complicate dbt projects? Commit to your answer.
Concept: Organizing models into nested folders and packages supports large, complex projects.
In big projects, you can create nested folders inside models to separate domains or business areas. You can also use dbt packages to share code across projects. This modular approach improves collaboration and code reuse but requires careful planning to avoid complexity.
Result
You can design scalable dbt projects that support team collaboration and code sharing.
Understanding modularization prepares you to manage real-world, large-scale data transformation projects effectively.
Under the Hood
dbt reads the project files and compiles SQL models into executable queries. It uses the dbt_project.yml to understand configurations and folder paths. When running, dbt processes models in dependency order, applies tests, and manages snapshots and seeds. Macros are expanded inline during compilation, allowing dynamic SQL generation. This process ensures transformations are reproducible and version-controlled.
Why designed this way?
dbt was designed to bring software engineering best practices to data transformation. The project structure enforces organization, modularity, and clarity, making complex data workflows manageable. Alternatives like unstructured SQL scripts were error-prone and hard to maintain, so dbt’s structure solves these problems by standardizing project layout and behavior.
┌───────────────────────────────┐
│         dbt CLI Command       │
└──────────────┬────────────────┘
               │
       Reads dbt_project.yml
               │
┌──────────────▼───────────────┐
│   Loads Models, Macros, Tests │
│   from project folders        │
└──────────────┬───────────────┘
               │
       Compiles SQL with macros
               │
┌──────────────▼───────────────┐
│ Executes SQL in Data Warehouse│
│ Runs Tests and Snapshots      │
└───────────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think the models folder can contain any file type, like images or text? Commit to yes or no.
Common Belief:The models folder can hold any files related to the project.
Tap to reveal reality
Reality:The models folder should only contain SQL files defining transformations; other files belong elsewhere.
Why it matters:Putting non-SQL files in models can cause dbt to fail or behave unpredictably during compilation.
Quick: Do you think tests in dbt only run once during project setup? Commit to yes or no.
Common Belief:Tests are one-time checks to validate data when the project is created.
Tap to reveal reality
Reality:Tests run every time dbt runs to continuously ensure data quality.
Why it matters:Assuming tests run once leads to ignoring data quality issues that arise later.
Quick: Do you think macros are just comments or documentation? Commit to yes or no.
Common Belief:Macros are only for documenting SQL code and do not affect execution.
Tap to reveal reality
Reality:Macros are executable SQL snippets that run during compilation to generate dynamic SQL.
Why it matters:Misunderstanding macros leads to missing powerful ways to simplify and reuse code.
Quick: Do you think dbt_project.yml only sets folder names and nothing else? Commit to yes or no.
Common Belief:dbt_project.yml is just a simple file to tell dbt where folders are.
Tap to reveal reality
Reality:It controls many settings including model materializations, versioning, and configurations.
Why it matters:Ignoring this file’s power limits your ability to customize and optimize dbt runs.
Expert Zone
1
Model folder structure impacts compilation time; flatter structures compile faster but may reduce clarity.
2
Macros can accept arguments and use Jinja control flow, enabling complex dynamic SQL generation beyond simple reuse.
3
dbt_project.yml supports environment-specific overrides, allowing different behaviors in development vs production.
When NOT to use
For very simple or one-off SQL scripts, using dbt and its project structure may be overkill. Alternatives like direct SQL scripts or lightweight ETL tools might be better. Also, if your data transformations require complex procedural logic, a full ETL tool or custom code might be more suitable.
Production Patterns
In production, teams use modular folder structures to separate business domains, enforce strict testing in tests folders, and use macros for common logic like date handling. They automate dbt runs with CI/CD pipelines and use dbt packages to share reusable code across projects.
Connections
Software Engineering Project Structure
dbt project structure builds on the same principles of organizing code and resources for clarity and maintainability.
Understanding software project organization helps grasp why dbt enforces a clear folder and file layout.
Modular Programming
Macros and folder modularization in dbt mirror modular programming concepts in software development.
Knowing modular programming clarifies how to write reusable and maintainable SQL code in dbt.
Kitchen Organization
Like organizing a kitchen for efficient cooking, dbt project structure organizes files for efficient data transformation.
This cross-domain connection highlights the universal value of good organization for complex tasks.
Common Pitfalls
#1Placing non-SQL files inside the models folder.
Wrong approach:models/readme.txt models/image.png
Correct approach:docs/readme.txt assets/image.png
Root cause:Misunderstanding that models folder is only for SQL transformation files.
#2Not running tests regularly, assuming data is always clean.
Wrong approach:dbt run # never runs dbt test
Correct approach:dbt run dbt test
Root cause:Underestimating the importance of continuous data quality checks.
#3Writing repeated SQL code instead of using macros.
Wrong approach:SELECT date_trunc('month', order_date) FROM orders -- repeated in many models
Correct approach:{% macro month_start(date) %} date_trunc('month', {{ date }}) {% endmacro %} SELECT {{ month_start('order_date') }} FROM orders
Root cause:Not knowing how to create and use macros for reusable SQL.
Key Takeaways
A clear dbt project structure organizes your data transformation files for easy understanding and maintenance.
Models folder holds SQL files that define how raw data becomes useful insights.
Tests and seeds folders help ensure data quality and provide static data inputs.
Macros enable reusable SQL code, reducing repetition and errors.
The dbt_project.yml file controls project-wide settings and behavior, making your project flexible and powerful.