How to use partitioned table bigquery

GcpHow-ToBeginner · 4 min read

How to Use Partitioned Tables in BigQuery: Syntax and Examples

In BigQuery, use PARTITION BY clause to create a partitioned table that organizes data by a column like date. This helps speed up queries and reduce costs by scanning only relevant partitions. You can create partitioned tables using SQL CREATE TABLE with PARTITION BY or by specifying partitioning in the UI or API.

📐

Syntax

The basic syntax to create a partitioned table in BigQuery uses the PARTITION BY clause in the CREATE TABLE statement. You specify the column to partition by, usually a DATE or TIMESTAMP column.

table_name: Name of the new table.
column_definitions: List of columns and their data types.
PARTITION BY: Column used to split data into partitions.
AS SELECT: Optional, to populate the table from a query.

sql

CREATE TABLE dataset.table_name (
  column1 STRING,
  column2 INT64,
  date_column DATE
)
PARTITION BY date_column;

💻

Example

This example creates a partitioned table named sales_partitioned in the mydataset dataset. It partitions data by the sale_date column of type DATE. Then it inserts sample data and queries only one partition.

sql

CREATE TABLE mydataset.sales_partitioned (
  product STRING,
  quantity INT64,
  sale_date DATE
)
PARTITION BY sale_date;

INSERT INTO mydataset.sales_partitioned (product, quantity, sale_date) VALUES
('apple', 10, '2024-06-01'),
('banana', 5, '2024-06-02'),
('orange', 8, '2024-06-01');

SELECT * FROM mydataset.sales_partitioned
WHERE sale_date = '2024-06-01';

Output

product | quantity | sale_date --------|----------|----------- apple | 10 | 2024-06-01 orange | 8 | 2024-06-01

⚠️

Common Pitfalls

Common mistakes when using partitioned tables in BigQuery include:

Not using a DATE or TIMESTAMP column for partitioning, which is required.
Querying without filtering on the partition column, causing full table scans and higher costs.
Trying to partition by a column with many unique values, which can reduce performance benefits.
Using legacy SQL syntax instead of standard SQL.

Always filter on the partition column in your queries to get the best performance.

sql

/* Wrong: No partition filter, scans entire table */
SELECT * FROM mydataset.sales_partitioned;

/* Right: Filter on partition column to scan only one partition */
SELECT * FROM mydataset.sales_partitioned WHERE sale_date = '2024-06-01';

📊

Quick Reference

Feature	Description
Partition Column	Must be DATE or TIMESTAMP type
Partition Type	Supports ingestion time or column-based partitioning
Query Filter	Use WHERE on partition column to reduce scanned data
Cost Benefit	Only partitions scanned are billed
Limitations	Cannot partition by STRING or INTEGER columns

✅

Key Takeaways

Use the PARTITION BY clause with a DATE or TIMESTAMP column to create partitioned tables.

Always filter queries on the partition column to improve performance and reduce cost.

Partitioning organizes data into segments, making queries faster and cheaper.

Avoid partitioning by columns with many unique values or unsupported types.

BigQuery charges only for the data scanned in the partitions used by your query.