0
0
dbtdata~5 mins

Snapshot tables for historical tracking in dbt - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Snapshot tables for historical tracking
O(n)
Understanding Time Complexity

When using snapshot tables in dbt, we want to know how the time to update snapshots changes as data grows.

We ask: How does the work grow when there are more records to track?

Scenario Under Consideration

Analyze the time complexity of the following dbt snapshot configuration.

snapshot:
  name: customer_snapshot
  strategy: timestamp
  updated_at: updated_at
  unique_key: customer_id

select * from source.customers

This snapshot tracks changes in the customers table by checking the updated_at timestamp for each record.

Identify Repeating Operations

Look for repeated work done during snapshot updates.

  • Primary operation: Comparing each source record's timestamp to the snapshot's stored timestamp.
  • How many times: Once for every record in the source table each time the snapshot runs.
How Execution Grows With Input

As the number of records grows, the snapshot process checks each record's timestamp.

Input Size (n)Approx. Operations
1010 timestamp checks
100100 timestamp checks
10001000 timestamp checks

Pattern observation: The work grows directly with the number of records.

Final Time Complexity

Time Complexity: O(n)

This means the time to update the snapshot grows in a straight line as the number of records increases.

Common Mistake

[X] Wrong: "Snapshot updates only check changed records, so time stays constant no matter how many records exist."

[OK] Correct: The snapshot must scan all records to find which ones changed, so time grows with total records, not just changed ones.

Interview Connect

Understanding how snapshot updates scale helps you explain data freshness and performance in real projects.

Self-Check

What if we changed the snapshot strategy from timestamp to check columns? How would the time complexity change?