0
0
dbtdata~5 mins

source() function for raw tables in dbt - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: source() function for raw tables
O(n)
Understanding Time Complexity

We want to understand how the time to run a dbt model using the source() function changes as the size of the raw table grows.

Specifically, how does the data volume affect the work dbt does when reading from a raw table?

Scenario Under Consideration

Analyze the time complexity of the following dbt code snippet.


select *
from {{ source('raw_schema', 'raw_table') }}
where event_date >= '2024-01-01'
    

This code reads data from a raw table using source() and filters rows by date.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Scanning rows in the raw table to apply the date filter.
  • How many times: Once for each row in the raw table.
How Execution Grows With Input

As the number of rows in the raw table grows, the number of rows scanned grows at the same rate.

Input Size (n)Approx. Operations
10About 10 row checks
100About 100 row checks
1000About 1000 row checks

Pattern observation: The work grows directly with the number of rows; doubling rows doubles work.

Final Time Complexity

Time Complexity: O(n)

This means the time to run grows linearly with the number of rows in the raw table.

Common Mistake

[X] Wrong: "Using source() is instant and does not depend on table size."

[OK] Correct: The source() function points to the raw table, so the database must scan rows to apply filters, which takes longer as the table grows.

Interview Connect

Understanding how data size affects query time is key for writing efficient dbt models and working with raw data sources.

Self-Check

What if we added an index on event_date? How would the time complexity change?