Bird
Raised Fist0
dbtdata~20 mins

Loading CSV seeds in dbt - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Seed Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of seed loading with incremental model

Given a dbt seed file products.csv with 3 rows, and an incremental model that selects all rows from this seed, what will be the number of rows in the model after two runs if the seed file is unchanged?

dbt
models/incremental_products.sql

{{ config(materialized='incremental', unique_key='product_id') }}

select * from {{ ref('products') }}
A3 rows
B6 rows
C0 rows
DError due to duplicate keys
Attempts:
2 left
💡 Hint

Think about how incremental models handle existing rows with the same unique key.

data_output
intermediate
2:00remaining
Data output after loading a CSV seed with transformations

You have a seed CSV file sales.csv with columns date, amount. You create a dbt model that loads this seed and adds a new column amount_taxed which is amount * 1.1. What is the output table if the seed has rows:

date,amount
2024-01-01,100
2024-01-02,200
dbt
select date, amount, amount * 1.1 as amount_taxed from {{ ref('sales') }}
A[{'date': '2024-01-01', 'amount': 100, 'amount_taxed': 110.0}, {'date': '2024-01-02', 'amount': 200, 'amount_taxed': 220.0}]
B[{'date': '2024-01-01', 'amount': 100, 'amount_taxed': 100.0}, {'date': '2024-01-02', 'amount': 200, 'amount_taxed': 200.0}]
C[{'date': '2024-01-01', 'amount': 110, 'amount_taxed': 110.0}, {'date': '2024-01-02', 'amount': 220, 'amount_taxed': 220.0}]
D[{'date': '2024-01-01', 'amount': 100, 'amount_taxed': null}, {'date': '2024-01-02', 'amount': 200, 'amount_taxed': null}]
Attempts:
2 left
💡 Hint

Remember to multiply the amount by 1.1 to get amount_taxed.

🔧 Debug
advanced
2:00remaining
Identify the error when loading a CSV seed with invalid config

Consider this dbt seed configuration in dbt_project.yml:

seeds:
  my_project:
    sales:
      file: sales_data.csv
      delimiter: ';'
      header: false

What error will occur when running dbt seed?

AError: The seed file 'sales_data.csv' is missing header row but 'header' is set to false, causing parsing failure.
BError: Delimiter ';' is not supported by dbt seed.
CNo error, seed loads successfully with semicolon delimiter and no header.
DError: Invalid config key 'file' in seed configuration.
Attempts:
2 left
💡 Hint

Check the valid configuration keys for seeds in dbt.

🧠 Conceptual
advanced
2:00remaining
Understanding seed freshness and reload behavior

When you run dbt seed multiple times without changing the CSV file, what happens to the data in the database table created by the seed?

AThe seed command does nothing if the CSV file is unchanged.
BThe table is dropped and recreated each time, so data is replaced with the CSV content.
CNew rows are appended to the existing table, causing duplicates.
DThe table is updated only if the CSV file has a newer timestamp.
Attempts:
2 left
💡 Hint

Think about how dbt seed manages the target table on each run.

🚀 Application
expert
3:00remaining
Predict the output of a seed with mixed data types and nulls

You have a seed CSV users.csv with columns id, name, age:

id,name,age
1,Alice,30
2,Bob,
3,Charlie,25

You load this seed in dbt and run a model that selects all rows where age > 25. What rows will the model return?

dbt
select * from {{ ref('users') }} where age > 25
A[{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': null}]
B[{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 3, 'name': 'Charlie', 'age': 25}]
C[{'id': 1, 'name': 'Alice', 'age': 30}]
D[]
Attempts:
2 left
💡 Hint

Remember that null values do not satisfy comparison operators like >.

Practice

(1/5)
1. What is the main purpose of loading CSV seeds in dbt?
easy
A. To load small, fixed reference data as tables in the database
B. To run complex SQL transformations on large datasets
C. To create temporary views for data exploration
D. To export data from the database to CSV files

Solution

  1. Step 1: Understand the role of seeds in dbt

    Seeds are used to load CSV files as tables in the database, mainly for small, fixed reference data.
  2. Step 2: Compare options with seed purpose

    Options B, C, and D describe other dbt or database functions, not seed loading.
  3. Final Answer:

    To load small, fixed reference data as tables in the database -> Option A
  4. Quick Check:

    Seeds = small fixed reference data [OK]
Hint: Seeds load small fixed data as tables [OK]
Common Mistakes:
  • Thinking seeds are for large datasets
  • Confusing seeds with models or views
  • Assuming seeds export data instead of loading
2. Which folder should you place your CSV files in to load them as seeds in dbt?
easy
A. data
B. models
C. macros
D. snapshots

Solution

  1. Step 1: Recall the folder structure for dbt seeds

    CSV files for seeds must be placed in the 'data' folder inside the dbt project.
  2. Step 2: Eliminate other folders

    'models' is for SQL models, 'macros' for reusable code, 'snapshots' for snapshot data, so they are incorrect for seeds.
  3. Final Answer:

    data -> Option A
  4. Quick Check:

    Seeds folder = data [OK]
Hint: Put CSVs in 'data' folder for seeds [OK]
Common Mistakes:
  • Placing CSVs in 'models' folder
  • Confusing 'macros' with data storage
  • Using 'snapshots' folder for seeds
3. Given the following dbt command run in a project with a CSV file named countries.csv in the data folder, what will happen?
dbt seed
medium
A. Nothing will happen unless you specify the table name
B. The CSV file will be deleted from the data folder
C. The CSV file will be loaded as a table named 'countries' in the database
D. The CSV file will be converted to a model SQL file

Solution

  1. Step 1: Understand the effect of dbt seed

    This command loads all CSV files in the 'data' folder as tables in the database, using the CSV filename as the table name.
  2. Step 2: Apply to the given CSV file

    The file 'countries.csv' will be loaded as a table named 'countries'. No extra arguments are needed.
  3. Final Answer:

    The CSV file will be loaded as a table named 'countries' in the database -> Option C
  4. Quick Check:

    dbt seed loads CSVs as tables named after files [OK]
Hint: dbt seed loads CSVs as tables named by file [OK]
Common Mistakes:
  • Thinking dbt seed deletes files
  • Expecting dbt seed to convert CSV to SQL
  • Believing table name must be specified manually
4. You ran dbt seed but the table did not appear in your database. Which of the following is the most likely cause?
medium
A. The CSV file is empty
B. The CSV file has a .txt extension instead of .csv
C. You forgot to run dbt run after dbt seed
D. The CSV file is not placed inside the 'data' folder

Solution

  1. Step 1: Check the seed loading requirements

    dbt only loads CSV files placed inside the 'data' folder when running dbt seed.
  2. Step 2: Analyze the options

    If the CSV is outside 'data', dbt seed ignores it. A .txt file won't be loaded. Running dbt run is unrelated to seeds. An empty CSV still creates an empty table.
  3. Final Answer:

    The CSV file is not placed inside the 'data' folder -> Option D
  4. Quick Check:

    CSV must be in 'data' folder for seed [OK]
Hint: CSV must be in 'data' folder to load [OK]
Common Mistakes:
  • Assuming dbt run loads seeds
  • Ignoring file extension importance
  • Thinking empty CSV prevents table creation
5. You have a CSV file products.csv with columns id, name, and price. You want to load it as a seed and then create a model that selects only products with price > 100. Which steps should you follow?
hard
A. Place products.csv in 'data', run dbt run, then create a model filtering price > 100
B. Place products.csv in 'data', run dbt seed, then create a model SQL selecting from the seed table filtering price > 100
C. Place products.csv in 'models', run dbt run, then filter price in the CSV file
D. Place products.csv in 'snapshots', run dbt seed, then create a model selecting all products

Solution

  1. Step 1: Load CSV as seed

    Place the CSV in the 'data' folder and run dbt seed to load it as a table.
  2. Step 2: Create a model filtering data

    Create a model SQL file that selects from the seed table and filters products where price > 100.
  3. Step 3: Understand why other options fail

    Place products.csv in 'models', run dbt run, then filter price in the CSV file places CSV in wrong folder and filters CSV manually. Place products.csv in 'data', run dbt run, then create a model filtering price > 100 misses running dbt seed. Place products.csv in 'snapshots', run dbt seed, then create a model selecting all products uses wrong folder and does not filter.
  4. Final Answer:

    Place products.csv in 'data', run dbt seed, then create a model SQL selecting from the seed table filtering price > 100 -> Option B
  5. Quick Check:

    Seed CSV in 'data' + dbt seed + model filter = correct [OK]
Hint: Seed CSV in 'data', run dbt seed, then model filter [OK]
Common Mistakes:
  • Placing CSV in wrong folder
  • Skipping dbt seed command
  • Filtering data outside SQL model