0
0
Snowflakecloud~7 mins

Clustering keys for large tables in Snowflake - Commands & Configuration

Choose your learning style9 modes available
Introduction
Large tables can become slow to query because data is not organized efficiently. Clustering keys help Snowflake organize data in a way that speeds up queries by grouping related rows together.
When you have a very large table and queries filter on specific columns often
When you notice queries scanning too much data and running slowly
When you want to improve performance without manually partitioning data
When your table grows continuously and you want to keep data organized automatically
When you want to reduce the cost of querying large datasets by scanning less data
Commands
This command adds a clustering key on the columns 'region' and 'sale_date' to the 'sales_data' table. It tells Snowflake to organize data based on these columns to speed up queries filtering on them.
Terminal
ALTER TABLE sales_data CLUSTER BY (region, sale_date);
Expected OutputExpected
ALTER TABLE
This command checks how well the 'sales_data' table is clustered. It shows if the clustering key is effective or if Snowflake needs to reorganize data.
Terminal
SELECT SYSTEM$CLUSTERING_INFORMATION('sales_data');
Expected OutputExpected
{ "clustering_depth": 3, "average_depth": 2.5, "partition_count": 10 }
This command tells Snowflake to reorganize the data in 'sales_data' based on the clustering key. It helps improve query performance by physically sorting the data.
Terminal
ALTER TABLE sales_data RECLUSTER;
Expected OutputExpected
ALTER TABLE
This query benefits from the clustering key by scanning less data when filtering on 'region' and 'sale_date'. It runs faster because data is organized by these columns.
Terminal
SELECT COUNT(*) FROM sales_data WHERE region = 'East' AND sale_date = '2023-05-01';
Expected OutputExpected
12345
Key Concept

If you remember nothing else from this pattern, remember: clustering keys help Snowflake organize large tables to speed up queries filtering on specific columns.

Common Mistakes
Adding clustering keys on columns that are rarely used in filters
Snowflake spends resources organizing data that does not improve query speed, wasting compute and storage.
Choose clustering keys based on columns frequently used in WHERE clauses or join conditions.
Not running RECLUSTER after adding or changing clustering keys
Data remains unordered physically, so queries do not benefit from clustering until reclustering happens.
Run ALTER TABLE RECLUSTER to reorganize data after setting clustering keys.
Expecting immediate query speedup without checking clustering effectiveness
Clustering may not be effective if data is not well distributed or if clustering depth is low.
Use SYSTEM$CLUSTERING_INFORMATION to monitor clustering quality and adjust keys if needed.
Summary
Use ALTER TABLE CLUSTER BY to set clustering keys on columns used often in queries.
Check clustering effectiveness with SYSTEM$CLUSTERING_INFORMATION.
Run ALTER TABLE RECLUSTER to physically reorganize data and improve query speed.