How to Use BigQuery with Python: Simple Guide
To use
BigQuery with Python, install the google-cloud-bigquery library, create a client with your Google Cloud credentials, and run SQL queries using the client. This lets you interact with BigQuery datasets and tables directly from Python code.Syntax
Here is the basic syntax to query BigQuery using Python:
- Import the BigQuery client library.
- Create a BigQuery client object.
- Write your SQL query as a string.
- Run the query using the client.
- Iterate over the results.
python
from google.cloud import bigquery client = bigquery.Client() query = "SELECT * FROM `project.dataset.table` LIMIT 10" query_job = client.query(query) for row in query_job.result(): print(row)
Example
This example shows how to query the public BigQuery dataset bigquery-public-data.samples.shakespeare to get the first 5 rows.
python
from google.cloud import bigquery # Create a BigQuery client client = bigquery.Client() # Define SQL query query = "SELECT word, word_count FROM `bigquery-public-data.samples.shakespeare` LIMIT 5" # Run the query query_job = client.query(query) # Print results for row in query_job.result(): print(f"Word: {row.word}, Count: {row.word_count}")
Output
Word: the, Count: 7989
Word: and, Count: 4006
Word: I, Count: 3865
Word: to, Count: 3606
Word: of, Count: 3536
Common Pitfalls
Common mistakes when using BigQuery with Python include:
- Not setting up Google Cloud authentication properly, causing permission errors.
- Using incorrect project or dataset names in the SQL query.
- Forgetting to install the
google-cloud-bigquerylibrary. - Not handling query job results asynchronously or checking for errors.
Always ensure your environment variable GOOGLE_APPLICATION_CREDENTIALS points to your service account JSON key file.
python
import os # Wrong: No credentials set # client = bigquery.Client() # This will fail if no credentials # Right: Set credentials path os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your/key.json" from google.cloud import bigquery client = bigquery.Client()
Quick Reference
| Step | Description |
|---|---|
| Install library | pip install google-cloud-bigquery |
| Set credentials | Export GOOGLE_APPLICATION_CREDENTIALS environment variable |
| Create client | client = bigquery.Client() |
| Write query | query = "SELECT * FROM `project.dataset.table`" |
| Run query | query_job = client.query(query) |
| Read results | for row in query_job.result(): print(row) |
Key Takeaways
Install and import the google-cloud-bigquery Python library before use.
Set up Google Cloud credentials with the GOOGLE_APPLICATION_CREDENTIALS environment variable.
Create a BigQuery client object to run SQL queries from Python.
Use client.query() to execute SQL and iterate over results to access data.
Check project and dataset names carefully to avoid query errors.