Snapshot tables for historical tracking in dbt - Time & Space Complexity
When using snapshot tables in dbt, we want to know how the time to update snapshots changes as data grows.
We ask: How does the work grow when there are more records to track?
Analyze the time complexity of the following dbt snapshot configuration.
snapshot:
name: customer_snapshot
strategy: timestamp
updated_at: updated_at
unique_key: customer_id
select * from source.customers
This snapshot tracks changes in the customers table by checking the updated_at timestamp for each record.
Look for repeated work done during snapshot updates.
- Primary operation: Comparing each source record's timestamp to the snapshot's stored timestamp.
- How many times: Once for every record in the source table each time the snapshot runs.
As the number of records grows, the snapshot process checks each record's timestamp.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 timestamp checks |
| 100 | 100 timestamp checks |
| 1000 | 1000 timestamp checks |
Pattern observation: The work grows directly with the number of records.
Time Complexity: O(n)
This means the time to update the snapshot grows in a straight line as the number of records increases.
[X] Wrong: "Snapshot updates only check changed records, so time stays constant no matter how many records exist."
[OK] Correct: The snapshot must scan all records to find which ones changed, so time grows with total records, not just changed ones.
Understanding how snapshot updates scale helps you explain data freshness and performance in real projects.
What if we changed the snapshot strategy from timestamp to check columns? How would the time complexity change?