0
0
DbtHow-ToBeginner ยท 3 min read

How to Create a Staging Model in dbt: Simple Guide

To create a staging model in dbt, define a SQL file in the models/staging folder that selects and cleans raw source data. Use ref() to reference source tables and apply basic transformations to prepare data for downstream models.
๐Ÿ“

Syntax

A staging model in dbt is a SQL file that selects data from raw sources and prepares it for further modeling. It typically lives in the models/staging/ directory.

Key parts:

  • select: Choose columns from the source table.
  • from {{ ref('raw_users') }}: Reference the raw source table using ref().
  • Basic transformations: Rename columns, cast types, or filter rows.
sql
select
  id as user_id,
  lower(email) as email,
  created_at::timestamp as created_at
from {{ ref('raw_users') }}
๐Ÿ’ป

Example

This example shows a staging model that cleans and prepares user data from a raw source table called raw_users. It renames columns, converts emails to lowercase, and casts timestamps.

sql
select
  id as user_id,
  lower(email) as email,
  created_at::timestamp as created_at
from {{ ref('raw_users') }}
Output
A table with columns: user_id (int), email (string, lowercase), created_at (timestamp)
โš ๏ธ

Common Pitfalls

Common mistakes when creating staging models include:

  • Not using ref() to reference source tables, which breaks dependency tracking.
  • Applying complex business logic instead of simple cleaning; staging models should be simple.
  • Not placing staging models in a dedicated staging/ folder, which reduces project clarity.
sql
/* Wrong: Direct table name without ref() */
select * from raw_users

/* Right: Use ref() for source table */
select * from {{ ref('raw_users') }}
๐Ÿ“Š

Quick Reference

StepDescription
Create SQL filePlace in models/staging/ folder
Reference sourceUse {{ ref('raw_users') }} to select raw data
Clean dataRename columns, cast types, simple filters
Keep simpleAvoid complex logic in staging models
TestRun dbt to ensure model builds correctly
โœ…

Key Takeaways

Create staging models as simple SQL selects that clean raw data.
Always use ref() to reference source tables for dependency tracking.
Place staging models in a dedicated models/staging/ folder.
Avoid complex transformations in staging; keep them for later models.
Test your staging models by running dbt build or run commands.