How to Schedule Queries in BigQuery Easily
To schedule a query in
BigQuery, use Cloud Scheduler to trigger a Cloud Function or Cloud Run service that runs your query. Alternatively, use BigQuery Scheduled Queries in the UI or API to automate query execution at set times.Syntax
BigQuery scheduled queries use a simple setup where you define the query, destination table, and schedule frequency. The key parts are:
- query: The SQL statement to run.
- destination_table: Where results are saved.
- schedule: Cron-style expression for timing.
- write_disposition: How to handle existing data (e.g., append or overwrite).
You can create scheduled queries via the BigQuery UI, CLI, or API.
bash
bq query --use_legacy_sql=false --destination_table=project.dataset.table --schedule='every 24 hours' --display_name='Daily Query' 'SELECT * FROM `project.dataset.source_table` WHERE DATE(timestamp) = CURRENT_DATE() - 1'
Example
This example shows how to create a scheduled query using the bq command-line tool that runs daily and saves results to a table.
bash
bq query \ --use_legacy_sql=false \ --destination_table=myproject.mydataset.daily_results \ --schedule='every 24 hours' \ --display_name='Daily Sales Summary' \ 'SELECT product_id, SUM(sales) AS total_sales FROM `myproject.mydataset.sales` WHERE DATE(sale_date) = CURRENT_DATE() - 1 GROUP BY product_id'
Output
Scheduled query 'Daily Sales Summary' created successfully.
Common Pitfalls
Common mistakes when scheduling BigQuery queries include:
- Using legacy SQL instead of standard SQL (always use
--use_legacy_sql=false). - Not specifying a destination table, causing query results to be lost.
- Incorrect cron syntax in the schedule expression.
- Not setting proper permissions for the scheduler or service account.
Always test your query manually before scheduling.
bash
bq query --use_legacy_sql=true --destination_table=myproject.mydataset.results --schedule='every 24 hours' 'SELECT * FROM `myproject.mydataset.table`' # Wrong: legacy SQL used, may cause errors bq query --use_legacy_sql=false --destination_table=myproject.mydataset.results --schedule='every 24 hours' 'SELECT * FROM `myproject.mydataset.table`' # Correct: standard SQL enabled
Quick Reference
Here is a quick summary of scheduling queries in BigQuery:
| Step | Description |
|---|---|
| Write SQL query | Create the query you want to run regularly. |
| Choose destination | Set the table where results will be saved. |
| Set schedule | Use cron syntax or presets like 'every 24 hours'. |
| Use standard SQL | Always set --use_legacy_sql=false. |
| Create scheduled query | Use BigQuery UI, CLI, or API to schedule. |
| Check permissions | Ensure scheduler has access to run queries. |
Key Takeaways
Use BigQuery Scheduled Queries feature or Cloud Scheduler with Cloud Functions to automate queries.
Always use standard SQL by setting --use_legacy_sql=false when scheduling queries.
Specify a destination table to save query results and avoid data loss.
Test your query manually before scheduling to ensure it runs correctly.
Set correct permissions for the service account running the scheduled query.