0
0
dbtdata~5 mins

Seeds for static reference data in dbt - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Seeds for static reference data
O(n)
Understanding Time Complexity

We want to understand how the time to load static reference data using seeds in dbt changes as the data size grows.

How does the number of rows in the seed file affect the loading time?

Scenario Under Consideration

Analyze the time complexity of this dbt seed loading snippet.


-- seeds/my_reference_data.csv
id,name
1,Category A
2,Category B
3,Category C

-- dbt_project.yml
seeds:
  my_project:
    my_reference_data:
      file: my_reference_data.csv

-- Usage in model
select * from {{ ref('my_reference_data') }}

This code loads a static CSV file as a seed table and references it in a model.

Identify Repeating Operations

Look at what happens when dbt loads the seed data.

  • Primary operation: Reading each row from the CSV file and inserting it into the database table.
  • How many times: Once per row in the seed file.
How Execution Grows With Input

As the number of rows in the seed file increases, the operations increase proportionally.

Input Size (n)Approx. Operations
1010 row reads and inserts
100100 row reads and inserts
10001000 row reads and inserts

Pattern observation: The work grows directly with the number of rows; doubling rows doubles the work.

Final Time Complexity

Time Complexity: O(n)

This means the time to load seeds grows linearly with the number of rows in the seed file.

Common Mistake

[X] Wrong: "Loading seeds is instant no matter how big the file is."

[OK] Correct: Each row must be read and inserted, so bigger files take more time.

Interview Connect

Understanding how seed loading scales helps you explain data pipeline performance clearly and shows you grasp practical data engineering concepts.

Self-Check

What if we compressed the seed file and loaded it directly? How might that affect the time complexity?