How to use clustered table bigquery

GcpHow-ToBeginner · 3 min read

How to Use Clustered Tables in BigQuery: Syntax and Example

To use a clustered table in BigQuery, specify the CLUSTER BY clause when creating or altering a table to group data by one or more columns. This helps BigQuery organize data for faster queries on those columns without extra cost.

📐

Syntax

Use the CLUSTER BY clause in your CREATE TABLE or CREATE TABLE AS SELECT statement to define clustering columns. These columns should be ones you often filter or group by.

CLUSTER BY column_name[, column_name2, ...]: Lists columns to cluster the table on.
CREATE TABLE: Starts the table creation.
AS SELECT: Optional, to create table from query results.

sql

CREATE TABLE dataset.table_name (
  column1 STRING,
  column2 INT64,
  column3 DATE
)
CLUSTER BY column1, column3;

💻

Example

This example creates a clustered table named sales_clustered in the mydataset dataset. It clusters data by customer_id and sale_date to speed up queries filtering on these columns.

sql

CREATE TABLE mydataset.sales_clustered (
  sale_id STRING,
  customer_id STRING,
  sale_date DATE,
  amount FLOAT64
)
CLUSTER BY customer_id, sale_date;

Output

Table mydataset.sales_clustered created with clustering on customer_id and sale_date.

⚠️

Common Pitfalls

Common mistakes when using clustered tables include:

Clustering on high-cardinality columns with many unique values, which reduces clustering effectiveness.
Not using clustering columns in queries, so clustering does not improve performance.
Trying to cluster on columns with data types not supported for clustering, like ARRAY or STRUCT.

Always choose columns you filter or group by frequently and that have moderate cardinality.

sql

/* Wrong: Clustering on ARRAY column (not supported) */
CREATE TABLE mydataset.bad_clustered (
  id STRING,
  tags ARRAY<STRING>
)
CLUSTER BY tags;

/* Right: Cluster on STRING column instead */
CREATE TABLE mydataset.good_clustered (
  id STRING,
  category STRING
)
CLUSTER BY category;

📊

Quick Reference

Feature	Description
CLUSTER BY	Defines columns to cluster data on for faster filtering
Supported Types	STRING, INT64, DATE, TIMESTAMP, BOOL, NUMERIC
Best Use	Columns frequently used in WHERE or GROUP BY clauses
Not Supported	ARRAY, STRUCT
Cost Impact	No extra storage cost, improves query speed

✅

Key Takeaways

Use CLUSTER BY in CREATE TABLE to define clustering columns in BigQuery.

Choose columns with moderate cardinality that you filter or group by often.

Clustering improves query speed without extra storage cost.

Avoid clustering on unsupported data types like ARRAY or STRUCT.

Queries filtering on clustering columns benefit most from clustering.