Why optimization controls Snowflake costs - Performance Analysis
We want to understand how the cost of running Snowflake queries grows as the data or workload grows.
The question is: how does the number of operations affect the cost when we optimize or not?
Analyze the time complexity of the following query execution steps.
-- Query scanning a large table
SELECT COUNT(*) FROM big_table WHERE condition = 'value';
-- Query with optimization using clustering keys
SELECT COUNT(*) FROM big_table WHERE condition = 'value';
-- Assume clustering reduces scanned data
This sequence shows a query scanning data with and without optimization to reduce scanned rows.
Look at what repeats during query execution and costs.
- Primary operation: Scanning data blocks (micro-partitions) in the table.
- How many times: Once per micro-partition scanned, which depends on data size and optimization.
As the table grows, the number of micro-partitions scanned grows too.
| Input Size (n) | Approx. Micro-partitions Scanned |
|---|---|
| 10 GB | 100 |
| 100 GB | 1000 |
| 1 TB | 10,000 |
Without optimization, scanned data grows roughly in direct proportion to table size.
Time Complexity: O(n)
This means the cost grows roughly in direct proportion to the amount of data scanned.
[X] Wrong: "Query cost stays the same no matter how big the data is."
[OK] Correct: More data means more micro-partitions to scan, so cost grows with data size unless optimized.
Understanding how query cost grows helps you design efficient data models and optimize queries, a key skill in cloud data work.
"What if we add clustering keys to the table? How would the time complexity change when filtering queries run?"