When to Use Seeds in dbt: Practical Guide and Examples
seeds in dbt when you need to load small, static CSV files as tables in your data warehouse. Seeds are perfect for reference data or lookup tables that rarely change and support your transformations without needing external sources.How It Works
Seeds in dbt are like small, fixed data files you keep inside your project. Imagine you have a list of country codes or product categories that don't change often. Instead of connecting to an external database or API, you save this data as a CSV file inside your dbt project.
When you run dbt, it reads these CSV files and loads them as tables in your data warehouse. This way, you can join or reference this static data easily in your models. Think of seeds as your project's built-in mini databases for fixed data.
Example
This example shows how to add a seed CSV file and use it in a dbt model.
/* File: data/countries.csv */ country_code,country_name US,United States CA,Canada MX,Mexico /* File: models/country_info.sql */ select country_code, country_name from {{ ref('countries') }}
When to Use
Use seeds when you have small, static datasets that support your transformations. Common cases include:
- Reference tables like country codes, product categories, or status codes.
- Lookup tables that rarely change and don't require a full database connection.
- Data that you want to version control alongside your dbt project for easy updates and tracking.
Seeds are not ideal for large or frequently changing data because they reload the entire CSV each time, which can be slow and inefficient.
Key Points
- Seeds load CSV files as tables inside your data warehouse.
- They are best for small, static reference data.
- Seeds simplify managing lookup data within your dbt project.
- Not suitable for large or frequently updated datasets.