Bird
Raised Fist0
dbtdata~10 mins

Seeds for static reference data in dbt - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Seeds for static reference data
Create CSV file with static data
Place CSV in 'data' folder
Define seed in dbt project
Run 'dbt seed' command
dbt loads CSV into database as table
Use seed table in models/queries
This flow shows how you create a CSV file with static data, place it in your dbt project, run dbt seed to load it into your database, and then use it in your data models.
Execution Sample
dbt
id,name,category
1,Apple,Fruit
2,Carrot,Vegetable
3,Banana,Fruit
This CSV file contains static reference data for items with their categories.
Execution Table
StepActionInput/CommandResult
1Create CSV fileid,name,category 1,Apple,Fruit 2,Carrot,Vegetable 3,Banana,FruitCSV file saved in 'data/items.csv'
2Place CSV in data folderMove 'items.csv' to 'data/''data/items.csv' available in project
3Run dbt seeddbt seedCSV loaded into database as table 'items'
4Use seed table in modelSELECT * FROM {{ ref('items') }}Query returns static reference data from seed table
5Modify CSV and rerun seedUpdate CSV and run dbt seedDatabase table 'items' updated with new data
6ExitNo more actionsStatic data ready for use in dbt models
💡 Static CSV data loaded into database as a seed table and ready for use.
Variable Tracker
VariableStartAfter Step 1After Step 3After Step 5Final
CSV fileNoneCreated with 3 rowsLoaded into database as table 'items'Updated with new dataFinal seed table in database
Database table 'items'NoneNoneCreated with CSV dataUpdated with new CSV dataReady for queries
Key Moments - 3 Insights
Why do we need to place the CSV file in the 'data' folder?
dbt looks for seed files only in the 'data' folder, so placing the CSV there allows 'dbt seed' to find and load it, as shown in execution_table step 2.
What happens when we run 'dbt seed'?
The command reads all CSV files in the 'data' folder and loads them into the database as tables, replacing existing seed tables, as seen in execution_table step 3.
Can we update the seed data after initial load?
Yes, by modifying the CSV file and rerunning 'dbt seed', the database table updates with new data, shown in execution_table step 5.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the result after running 'dbt seed' the first time?
ACSV file is created
BCSV file moved to 'data' folder
CCSV loaded into database as table 'items'
DQuery returns static reference data
💡 Hint
Check execution_table row 3 under 'Result'
At which step does the static data become available for queries in dbt models?
AStep 3
BStep 4
CStep 2
DStep 5
💡 Hint
Look at execution_table where the seed table is referenced in a query
If you update the CSV file but forget to run 'dbt seed' again, what happens?
ASeed table remains unchanged in database
BDatabase table updates automatically
Cdbt throws an error
DCSV file is deleted
💡 Hint
Refer to execution_table step 5 about updating CSV and rerunning seed
Concept Snapshot
Seeds in dbt are CSV files with static data.
Place CSVs in the 'data' folder.
Run 'dbt seed' to load CSVs as tables.
Use seed tables in models with {{ ref() }}.
Update CSV and rerun seed to refresh data.
Full Transcript
Seeds for static reference data in dbt involve creating CSV files with fixed data, placing them in the 'data' folder of your dbt project, and running the 'dbt seed' command. This command loads the CSV data into your database as tables. You can then use these seed tables in your dbt models by referencing them with the ref function. If you update the CSV files, rerunning 'dbt seed' updates the database tables accordingly. This process helps manage static data easily within your dbt workflows.

Practice

(1/5)
1. What is the main purpose of using seeds in dbt?
easy
A. To create dynamic tables based on SQL queries
B. To load static reference data from CSV files into your database
C. To schedule dbt runs automatically
D. To write Python scripts for data transformation

Solution

  1. Step 1: Understand what seeds are in dbt

    Seeds are CSV files that contain static reference data you want to load into your database.
  2. Step 2: Identify the main use of seeds

    Seeds let you easily add fixed data tables without writing SQL queries.
  3. Final Answer:

    To load static reference data from CSV files into your database -> Option B
  4. Quick Check:

    Seeds = static CSV data load [OK]
Hint: Seeds = fixed CSV data loaded as tables [OK]
Common Mistakes:
  • Confusing seeds with models that run SQL
  • Thinking seeds schedule dbt runs
  • Assuming seeds are for dynamic data
2. Which command do you run to load or refresh seed data in your database?
easy
A. dbt test
B. dbt run
C. dbt seed
D. dbt compile

Solution

  1. Step 1: Recall dbt commands related to seeds

    The command dbt seed loads CSV seed files into the database as tables.
  2. Step 2: Differentiate from other commands

    dbt run runs models, dbt test runs tests, and dbt compile compiles SQL but does not load seeds.
  3. Final Answer:

    dbt seed -> Option C
  4. Quick Check:

    Load seeds = dbt seed [OK]
Hint: Use 'dbt seed' to load CSV data tables [OK]
Common Mistakes:
  • Using 'dbt run' to load seeds
  • Confusing 'dbt test' with loading data
  • Thinking 'dbt compile' loads data
3. Given a seed CSV file countries.csv with columns id and name, what will be the output of this dbt model SQL?
select * from {{ ref('countries') }}
medium
A. A table with all rows and columns from countries.csv
B. Only the id column from countries.csv
C. An empty table because seeds are not loaded automatically
D. An error because seeds cannot be referenced

Solution

  1. Step 1: Understand how seeds are referenced in dbt

    Seeds become tables in the database and can be referenced using ref() like models.
  2. Step 2: Predict the query output

    The query selects all columns and rows from the seed table countries, so it returns the full CSV data.
  3. Final Answer:

    A table with all rows and columns from countries.csv -> Option A
  4. Quick Check:

    ref(seed) = full seed table [OK]
Hint: ref(seed_name) returns full seed table [OK]
Common Mistakes:
  • Thinking seeds cannot be referenced
  • Assuming seeds load empty tables
  • Expecting partial columns only
4. You ran dbt seed but your seed table did not update. Which of these is the most likely cause?
medium
A. You forgot to add the seed CSV file in the seeds folder
B. You ran dbt run instead of dbt seed
C. Your seed CSV file has syntax errors
D. You did not configure the seed in dbt_project.yml

Solution

  1. Step 1: Check seed discovery mechanism

    dbt automatically discovers and loads CSV files from the seeds/ folder with dbt seed.
  2. Step 2: Identify why table doesn't update

    If the CSV file is missing from the seeds/ folder, dbt seed runs successfully but skips that seed, leaving the table unchanged.
  3. Final Answer:

    You forgot to add the seed CSV file in the seeds folder -> Option A
  4. Quick Check:

    Seeds folder missing CSV = no update [OK]
Hint: Place seed CSVs in seeds/ folder for dbt seed [OK]
Common Mistakes:
  • Thinking seeds require config in dbt_project.yml
  • Confusing dbt run with dbt seed
  • CSV syntax errors (would cause explicit failure)
5. You want to use a seed file currencies.csv with columns code and symbol inside a model to join with a transactions table on currency_code. Which is the correct way to write the join in your model SQL?
hard
A. select t.*, c.symbol from transactions t join currencies c on t.currency_code = c.code
B. select t.*, c.symbol from transactions t join currencies.csv c on t.currency_code = c.code
C. select t.*, c.symbol from transactions t join seed('currencies') c on t.currency_code = c.code
D. select t.*, c.symbol from transactions t join {{ ref('currencies') }} c on t.currency_code = c.code

Solution

  1. Step 1: Recall how to reference seeds in dbt models

    Seeds are referenced using {{ ref('seed_name') }} to get the table name in SQL.
  2. Step 2: Identify the correct join syntax

    Joining transactions with {{ ref('currencies') }} correctly uses the seed table in the join.
  3. Final Answer:

    select t.*, c.symbol from transactions t join {{ ref('currencies') }} c on t.currency_code = c.code -> Option D
  4. Quick Check:

    Join seed with ref() = correct [OK]
Hint: Use ref('seed_name') to join seed tables in models [OK]
Common Mistakes:
  • Using raw CSV filename in SQL
  • Forgetting to use ref() for seeds
  • Trying to use a non-existent seed() function