How to Use Partitioned Tables in BigQuery: Syntax and Examples
In BigQuery, use
PARTITION BY clause to create a partitioned table that organizes data by a column like date. This helps speed up queries and reduce costs by scanning only relevant partitions. You can create partitioned tables using SQL CREATE TABLE with PARTITION BY or by specifying partitioning in the UI or API.Syntax
The basic syntax to create a partitioned table in BigQuery uses the PARTITION BY clause in the CREATE TABLE statement. You specify the column to partition by, usually a DATE or TIMESTAMP column.
- table_name: Name of the new table.
- column_definitions: List of columns and their data types.
- PARTITION BY: Column used to split data into partitions.
- AS SELECT: Optional, to populate the table from a query.
sql
CREATE TABLE dataset.table_name ( column1 STRING, column2 INT64, date_column DATE ) PARTITION BY date_column;
Example
This example creates a partitioned table named sales_partitioned in the mydataset dataset. It partitions data by the sale_date column of type DATE. Then it inserts sample data and queries only one partition.
sql
CREATE TABLE mydataset.sales_partitioned ( product STRING, quantity INT64, sale_date DATE ) PARTITION BY sale_date; INSERT INTO mydataset.sales_partitioned (product, quantity, sale_date) VALUES ('apple', 10, '2024-06-01'), ('banana', 5, '2024-06-02'), ('orange', 8, '2024-06-01'); SELECT * FROM mydataset.sales_partitioned WHERE sale_date = '2024-06-01';
Output
product | quantity | sale_date
--------|----------|-----------
apple | 10 | 2024-06-01
orange | 8 | 2024-06-01
Common Pitfalls
Common mistakes when using partitioned tables in BigQuery include:
- Not using a
DATEorTIMESTAMPcolumn for partitioning, which is required. - Querying without filtering on the partition column, causing full table scans and higher costs.
- Trying to partition by a column with many unique values, which can reduce performance benefits.
- Using legacy SQL syntax instead of standard SQL.
Always filter on the partition column in your queries to get the best performance.
sql
/* Wrong: No partition filter, scans entire table */ SELECT * FROM mydataset.sales_partitioned; /* Right: Filter on partition column to scan only one partition */ SELECT * FROM mydataset.sales_partitioned WHERE sale_date = '2024-06-01';
Quick Reference
| Feature | Description |
|---|---|
| Partition Column | Must be DATE or TIMESTAMP type |
| Partition Type | Supports ingestion time or column-based partitioning |
| Query Filter | Use WHERE on partition column to reduce scanned data |
| Cost Benefit | Only partitions scanned are billed |
| Limitations | Cannot partition by STRING or INTEGER columns |
Key Takeaways
Use the PARTITION BY clause with a DATE or TIMESTAMP column to create partitioned tables.
Always filter queries on the partition column to improve performance and reduce cost.
Partitioning organizes data into segments, making queries faster and cheaper.
Avoid partitioning by columns with many unique values or unsupported types.
BigQuery charges only for the data scanned in the partitions used by your query.