DbtConceptBeginner · 3 min read

When to Use Seeds in dbt: Practical Guide and Examples

Use seeds in dbt when you need to load small, static CSV files as tables in your data warehouse. Seeds are perfect for reference data or lookup tables that rarely change and support your transformations without needing external sources.

⚙️

How It Works

Seeds in dbt are like small, fixed data files you keep inside your project. Imagine you have a list of country codes or product categories that don't change often. Instead of connecting to an external database or API, you save this data as a CSV file inside your dbt project.

When you run dbt, it reads these CSV files and loads them as tables in your data warehouse. This way, you can join or reference this static data easily in your models. Think of seeds as your project's built-in mini databases for fixed data.

💻

Example

This example shows how to add a seed CSV file and use it in a dbt model.

sql

/* File: data/countries.csv */
country_code,country_name
US,United States
CA,Canada
MX,Mexico

/* File: models/country_info.sql */
select
  country_code,
  country_name
from {{ ref('countries') }}

Output

country_code | country_name ------------|-------------- US | United States CA | Canada MX | Mexico

🎯

When to Use

Use seeds when you have small, static datasets that support your transformations. Common cases include:

Reference tables like country codes, product categories, or status codes.
Lookup tables that rarely change and don't require a full database connection.
Data that you want to version control alongside your dbt project for easy updates and tracking.

Seeds are not ideal for large or frequently changing data because they reload the entire CSV each time, which can be slow and inefficient.

✅

Key Points

Seeds load CSV files as tables inside your data warehouse.
They are best for small, static reference data.
Seeds simplify managing lookup data within your dbt project.
Not suitable for large or frequently updated datasets.

✅

Key Takeaways

Seeds in dbt load small CSV files as tables for static reference data.

Use seeds for lookup tables that rarely change and support your models.

Seeds keep static data versioned and easy to manage within your project.

Avoid seeds for large or frequently updated datasets to maintain performance.