0
0
dbtdata~5 mins

Loading CSV seeds in dbt - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Loading CSV seeds
O(n)
Understanding Time Complexity

When loading CSV seeds in dbt, we want to understand how the time to load data changes as the CSV file grows larger.

We ask: How does the loading time increase when the CSV has more rows?

Scenario Under Consideration

Analyze the time complexity of the following dbt seed loading snippet.


-- dbt seed configuration example (in dbt_project.yml)
seeds:
  my_project:
    my_seed:
      +header: true
      +delimiter: ','

-- dbt command to load seed
-- dbt seed --select my_seed
    

This snippet shows how dbt loads a CSV seed file into the data warehouse table.

Identify Repeating Operations

Loading a CSV seed involves reading each row and inserting it into the database.

  • Primary operation: Reading and inserting each row from the CSV file.
  • How many times: Once per row in the CSV file.
How Execution Grows With Input

As the number of rows in the CSV file increases, the time to load grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 10 row reads and inserts
100About 100 row reads and inserts
1000About 1000 row reads and inserts

Pattern observation: Doubling the rows roughly doubles the work and time needed.

Final Time Complexity

Time Complexity: O(n)

This means the loading time grows linearly with the number of rows in the CSV file.

Common Mistake

[X] Wrong: "Loading a CSV seed is instant no matter the size."

[OK] Correct: Each row must be read and inserted, so bigger files take more time.

Interview Connect

Understanding how data loading scales helps you explain performance in real projects and shows you think about efficiency.

Self-Check

"What if the CSV file had multiple columns with complex data types? How would that affect the time complexity?"