How to Use Clustered Tables in BigQuery: Syntax and Example
To use a
clustered table in BigQuery, specify the CLUSTER BY clause when creating or altering a table to group data by one or more columns. This helps BigQuery organize data for faster queries on those columns without extra cost.Syntax
Use the CLUSTER BY clause in your CREATE TABLE or CREATE TABLE AS SELECT statement to define clustering columns. These columns should be ones you often filter or group by.
- CLUSTER BY column_name[, column_name2, ...]: Lists columns to cluster the table on.
- CREATE TABLE: Starts the table creation.
- AS SELECT: Optional, to create table from query results.
sql
CREATE TABLE dataset.table_name ( column1 STRING, column2 INT64, column3 DATE ) CLUSTER BY column1, column3;
Example
This example creates a clustered table named sales_clustered in the mydataset dataset. It clusters data by customer_id and sale_date to speed up queries filtering on these columns.
sql
CREATE TABLE mydataset.sales_clustered ( sale_id STRING, customer_id STRING, sale_date DATE, amount FLOAT64 ) CLUSTER BY customer_id, sale_date;
Output
Table mydataset.sales_clustered created with clustering on customer_id and sale_date.
Common Pitfalls
Common mistakes when using clustered tables include:
- Clustering on high-cardinality columns with many unique values, which reduces clustering effectiveness.
- Not using clustering columns in queries, so clustering does not improve performance.
- Trying to cluster on columns with data types not supported for clustering, like ARRAY or STRUCT.
Always choose columns you filter or group by frequently and that have moderate cardinality.
sql
/* Wrong: Clustering on ARRAY column (not supported) */ CREATE TABLE mydataset.bad_clustered ( id STRING, tags ARRAY<STRING> ) CLUSTER BY tags; /* Right: Cluster on STRING column instead */ CREATE TABLE mydataset.good_clustered ( id STRING, category STRING ) CLUSTER BY category;
Quick Reference
| Feature | Description |
|---|---|
| CLUSTER BY | Defines columns to cluster data on for faster filtering |
| Supported Types | STRING, INT64, DATE, TIMESTAMP, BOOL, NUMERIC |
| Best Use | Columns frequently used in WHERE or GROUP BY clauses |
| Not Supported | ARRAY, STRUCT |
| Cost Impact | No extra storage cost, improves query speed |
Key Takeaways
Use CLUSTER BY in CREATE TABLE to define clustering columns in BigQuery.
Choose columns with moderate cardinality that you filter or group by often.
Clustering improves query speed without extra storage cost.
Avoid clustering on unsupported data types like ARRAY or STRUCT.
Queries filtering on clustering columns benefit most from clustering.