0
0
GCPcloud~5 mins

BigQuery SQL and pricing model in GCP - Commands & Configuration

Choose your learning style9 modes available
Introduction
BigQuery lets you run SQL queries on large amounts of data quickly. It charges you based on the amount of data your queries scan, helping you control costs by writing efficient queries.
When you want to analyze large datasets without managing servers.
When you need to run fast SQL queries on data stored in the cloud.
When you want to pay only for the data you actually query.
When you want to store data in tables and run standard SQL commands.
When you want to avoid upfront costs and pay per query usage.
Commands
This command runs a SQL query on BigQuery public data to find the top 5 most common names in Texas. It uses standard SQL syntax by disabling legacy SQL.
Terminal
bq query --use_legacy_sql=false 'SELECT name, COUNT(*) as count FROM `bigquery-public-data.usa_names.usa_1910_2013` WHERE state = "TX" GROUP BY name ORDER BY count DESC LIMIT 5;'
Expected OutputExpected
Waiting on bqjob_r1234567890abcdef_00000123456789_1 ... (0s) Current status: DONE +--------+-------+ | name | count | +--------+-------+ | Mary | 12345 | | James | 11234 | | John | 11000 | | Robert | 10500 | | Linda | 10000 | +--------+-------+
--use_legacy_sql=false - Use standard SQL syntax instead of legacy SQL.
This command shows detailed metadata about the public dataset table, including schema and size, helping understand the data before querying.
Terminal
bq show --format=prettyjson bigquery-public-data:usa_names.usa_1910_2013
Expected OutputExpected
{ "kind": "bigquery#table", "etag": "etagvalue", "id": "bigquery-public-data:usa_names.usa_1910_2013", "selfLink": "https://www.googleapis.com/bigquery/v2/projects/bigquery-public-data/datasets/usa_names/tables/usa_1910_2013", "tableReference": { "projectId": "bigquery-public-data", "datasetId": "usa_names", "tableId": "usa_1910_2013" }, "schema": { "fields": [ {"name": "state", "type": "STRING"}, {"name": "gender", "type": "STRING"}, {"name": "year", "type": "INTEGER"}, {"name": "name", "type": "STRING"}, {"name": "number", "type": "INTEGER"} ] }, "numRows": "1000000", "creationTime": "1356998400000", "lastModifiedTime": "1672531200000", "type": "TABLE" }
--format=prettyjson - Show output in readable JSON format.
This command performs a dry run to estimate how much data the query will scan without running it, helping control costs.
Terminal
bq query --dry_run --use_legacy_sql=false 'SELECT * FROM `bigquery-public-data.usa_names.usa_1910_2013` WHERE state = "CA";'
Expected OutputExpected
Query successfully validated. Assuming the tables are not modified, running this query will process 123456789 bytes of data.
--dry_run - Estimate query cost without running the query.
--use_legacy_sql=false - Use standard SQL syntax.
Key Concept

If you remember nothing else, remember: BigQuery charges based on the amount of data your SQL queries scan, so writing efficient queries saves money.

Common Mistakes
Running queries without estimating data scanned first.
This can lead to unexpectedly high costs if the query scans large amounts of data.
Use the --dry_run flag to estimate data scanned before running expensive queries.
Using legacy SQL syntax without setting the flag.
Queries may fail or behave unexpectedly because legacy SQL differs from standard SQL.
Always use --use_legacy_sql=false to run standard SQL queries.
Selecting all columns (*) when only a few are needed.
This increases the amount of data scanned and raises costs unnecessarily.
Select only the columns you need in your query.
Summary
Use the bq command-line tool to run SQL queries on BigQuery datasets.
Estimate query costs with --dry_run to avoid unexpected charges.
Use standard SQL syntax by setting --use_legacy_sql=false for better compatibility.