Querying historical data (AT, BEFORE) in Snowflake - Time & Space Complexity
When we ask about querying historical data, we want to know how the time to get results changes as the data grows.
Specifically, how does Snowflake handle queries that ask for data at or before a certain point in time?
Analyze the time complexity of the following operation sequence.
-- Query data as it was at a specific time
SELECT *
FROM my_table
AT (TIMESTAMP => '2024-01-01 00:00:00');
-- Query data before a specific time
SELECT *
FROM my_table
BEFORE (TIMESTAMP => '2024-01-01 00:00:00');
This sequence retrieves data from a table as it existed at or before a given timestamp using Snowflake's time travel feature.
Identify the API calls, resource provisioning, data transfers that repeat.
- Primary operation: Reading data files from storage that match the requested time state.
- How many times: Once per query, but the amount of data scanned depends on how much changed since that time.
As the table grows larger, the query scans more data files to reconstruct the state at or before the timestamp.
| Input Size (n rows) | Approx. Data Scanned |
|---|---|
| 10,000 | Small amount, quick scan |
| 100,000 | More data, longer scan |
| 1,000,000 | Much more data, scan time grows |
Pattern observation: The time grows roughly in proportion to the amount of data that changed since the requested time.
Time Complexity: O(n)
This means the query time grows linearly with the amount of data scanned to reconstruct the historical state.
[X] Wrong: "Querying historical data is always instant regardless of data size."
[OK] Correct: The system must scan data changes since the requested time, so larger or older data means more work and longer query time.
Understanding how querying historical data scales helps you explain real-world trade-offs when working with time travel features in cloud data platforms.
"What if we queried data using a very recent timestamp close to the current time? How would the time complexity change?"