How to Create a Staging Model in dbt: Simple Guide
To create a staging model in
dbt, define a SQL file in the models/staging folder that selects and cleans raw source data. Use ref() to reference source tables and apply basic transformations to prepare data for downstream models.Syntax
A staging model in dbt is a SQL file that selects data from raw sources and prepares it for further modeling. It typically lives in the models/staging/ directory.
Key parts:
select: Choose columns from the source table.from {{ ref('raw_users') }}: Reference the raw source table usingref().- Basic transformations: Rename columns, cast types, or filter rows.
sql
select id as user_id, lower(email) as email, created_at::timestamp as created_at from {{ ref('raw_users') }}
Example
This example shows a staging model that cleans and prepares user data from a raw source table called raw_users. It renames columns, converts emails to lowercase, and casts timestamps.
sql
select id as user_id, lower(email) as email, created_at::timestamp as created_at from {{ ref('raw_users') }}
Output
A table with columns: user_id (int), email (string, lowercase), created_at (timestamp)
Common Pitfalls
Common mistakes when creating staging models include:
- Not using
ref()to reference source tables, which breaks dependency tracking. - Applying complex business logic instead of simple cleaning; staging models should be simple.
- Not placing staging models in a dedicated
staging/folder, which reduces project clarity.
sql
/* Wrong: Direct table name without ref() */ select * from raw_users /* Right: Use ref() for source table */ select * from {{ ref('raw_users') }}
Quick Reference
| Step | Description |
|---|---|
| Create SQL file | Place in models/staging/ folder |
| Reference source | Use {{ ref('raw_users') }} to select raw data |
| Clean data | Rename columns, cast types, simple filters |
| Keep simple | Avoid complex logic in staging models |
| Test | Run dbt to ensure model builds correctly |
Key Takeaways
Create staging models as simple SQL selects that clean raw data.
Always use ref() to reference source tables for dependency tracking.
Place staging models in a dedicated models/staging/ folder.
Avoid complex transformations in staging; keep them for later models.
Test your staging models by running dbt build or run commands.