Overview - Staging, intermediate, and marts pattern

What is it?

The staging, intermediate, and marts pattern is a way to organize data transformations in dbt projects. It breaks down the process into three layers: staging cleans and prepares raw data, intermediate applies business logic and combines data, and marts create final tables for analysis. This structure helps keep data workflows clear and manageable.

Why it matters

Without this pattern, data transformations can become messy and hard to maintain, leading to errors and slow analysis. Organizing work into layers makes it easier to find problems, reuse code, and deliver reliable data quickly. It helps teams work together smoothly and supports better decision-making with clean data.

Where it fits

Learners should first understand basic SQL and dbt concepts like models and dependencies. After mastering this pattern, they can explore advanced dbt features like snapshots, tests, and documentation to build robust data pipelines.

Mental Model

Core Idea

Breaking data transformations into clear layers—staging, intermediate, and marts—creates a clean, reusable, and understandable flow from raw data to final analysis tables.

Think of it like...

It's like cooking a meal: staging is washing and chopping ingredients, intermediate is cooking and mixing them into dishes, and marts are plating the food beautifully for guests to enjoy.

┌─────────────┐     ┌───────────────┐     ┌─────────────┐
│   Staging   │────▶│ Intermediate  │────▶│    Marts    │
│ (clean raw) │     │ (business logic)│     │ (final data)│
└─────────────┘     └───────────────┘     └─────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Raw Data Sources

Concept: Raw data is the starting point and often messy or inconsistent.

Raw data comes from places like databases or files. It may have missing values, duplicates, or inconsistent formats. Before using it, we need to clean and organize it.

Result

You recognize that raw data needs preparation before analysis.

Understanding raw data's imperfections explains why we need a staging layer to clean it first.

2

FoundationIntroduction to dbt Models

3

IntermediateCreating the Staging Layer

4

IntermediateBuilding the Intermediate Layer

5

IntermediateDesigning the Marts Layer

6

AdvancedManaging Dependencies and Testing

7

ExpertOptimizing for Performance and Maintenance

Under the Hood

dbt compiles SQL models into executable queries, respecting dependencies to build tables or views in order. The staging layer runs first, cleaning raw data into consistent tables. Intermediate models then run, applying business logic and joining staging tables. Finally, marts run to produce analysis-ready tables. dbt manages this flow automatically, tracking changes and rebuilding affected models.

Why designed this way?

This layered design was created to separate concerns: cleaning, logic, and presentation. It makes projects easier to understand, test, and maintain. Alternatives like monolithic SQL scripts were hard to debug and reuse. The pattern supports collaboration and incremental development.

┌─────────────┐
│   Raw Data  │
└─────┬───────┘
      │
┌─────▼───────┐
│  Staging    │
│ (cleaning)  │
└─────┬───────┘
      │
┌─────▼──────────┐
│ Intermediate   │
│ (business logic)│
└─────┬──────────┘
      │
┌─────▼───────┐
│   Marts     │
│ (final data)│
└─────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Is it okay to put business logic directly in staging models? Commit yes or no.

Common Belief:Many think staging models should include business logic to save time.

Tap to reveal reality

Quick: Do you think marts should always be simple views? Commit yes or no.

Common Belief:Some believe marts should always be views to avoid storage costs.

Tap to reveal reality

Quick: Does layering always improve performance? Commit yes or no.

Common Belief:Many assume layering automatically makes data pipelines faster.

Tap to reveal reality

Quick: Is it safe to skip testing in staging because data is raw? Commit yes or no.

Common Belief:Some think testing raw data is unnecessary since it comes from trusted sources.

Tap to reveal reality

Expert Zone

1

Staging models often mirror source tables but rename columns to a consistent naming convention, which is crucial for downstream clarity.

2

Intermediate models can be reused across multiple marts, enabling modular and DRY (Don't Repeat Yourself) transformations.

3

Materialization strategies (table, view, incremental) must be chosen carefully per layer to balance build time and query speed.

When NOT to use

This pattern is less suitable for very small projects where layering adds unnecessary complexity. In such cases, simpler flat models or direct transformations may be better. Also, real-time streaming data pipelines often require different architectures.

Production Patterns

In production, teams use this pattern combined with automated testing, documentation, and CI/CD pipelines. They version control dbt projects, use incremental models for large datasets, and separate environments for development and production to ensure data quality and reliability.

Connections

Software Engineering Layered Architecture

This data layering pattern mirrors software design layers like presentation, business logic, and data access.

Understanding software layering helps grasp why separating data cleaning, logic, and presentation improves maintainability and collaboration.

ETL Pipelines

The pattern is a modern, modular approach to traditional Extract-Transform-Load processes.

Knowing ETL basics clarifies how dbt layers replace monolithic transformations with reusable, testable steps.

Cooking Process

Like cooking stages (prep, cook, plate), data transformations progress through cleaning, logic, and final presentation.

This analogy helps understand the importance of order and separation in complex workflows.

Common Pitfalls

#1Putting business logic in staging models.

Wrong approach:select id, case when status = 'active' then 1 else 0 end as is_active from raw.users

Correct approach:select id, status from raw.users -- staging only cleans and renames columns -- business logic in intermediate: select id, case when status = 'active' then 1 else 0 end as is_active from staging.users

Root cause:Confusing cleaning with business logic leads to mixing concerns and harder maintenance.

#2Materializing all models as views causing slow queries.

Wrong approach:models: marts: materialized: view

Correct approach:models: marts: materialized: table

Root cause:Not considering query performance and data size leads to inefficient data access.

#3Skipping tests on staging models.

Wrong approach:No tests defined on staging models.

Correct approach:tests: - unique: id - not_null: id applied to staging models

Root cause:Assuming raw data is perfect causes errors to propagate unnoticed.

Key Takeaways

Organizing data transformations into staging, intermediate, and marts layers creates clear separation of concerns.

Staging cleans raw data without applying business rules, ensuring consistent inputs for later steps.

Intermediate models apply business logic and combine data, making transformations reusable and testable.

Marts produce final tables optimized for analysis and reporting, improving user experience.

Balancing layering with performance and testing at each stage is essential for reliable, maintainable data pipelines.