0
0
GcpHow-ToBeginner · 3 min read

How to Create a Dataset in BigQuery: Step-by-Step Guide

To create a dataset in BigQuery, use the bq mk command or the BigQuery console by specifying a dataset_id and optional location. You can also create datasets programmatically using the BigQuery API by defining a Dataset resource with a unique ID and location.
📐

Syntax

The basic syntax to create a dataset using the bq command-line tool is:

  • bq mk --dataset [PROJECT_ID]:[DATASET_ID] creates a dataset in the specified project.
  • --location flag sets the geographic location (e.g., US, EU).

In the API, you define a Dataset object with datasetReference.datasetId and optional location.

bash
bq mk --dataset [PROJECT_ID]:[DATASET_ID] --location=[LOCATION]
💻

Example

This example shows how to create a dataset named my_dataset in the my-project project located in the US:

bash
bq mk --dataset my-project:my_dataset --location=US
Output
Dataset 'my-project:my_dataset' successfully created.
⚠️

Common Pitfalls

Common mistakes when creating datasets include:

  • Using dataset IDs with invalid characters; only letters, numbers, and underscores are allowed.
  • Not specifying the project ID, which defaults to your active project and may cause confusion.
  • Omitting the location flag when your project requires a specific dataset location.
  • Trying to create a dataset that already exists, which causes an error.
bash
bq mk --dataset my-project:my dataset
# Wrong: space in dataset ID

bq mk --dataset my-project:my_dataset
# Right: underscore instead of space
📊

Quick Reference

Command or FieldDescription
bq mk --dataset [PROJECT_ID]:[DATASET_ID]Create a new dataset in a project
--location=[LOCATION]Set geographic location of the dataset
datasetReference.datasetIdUnique ID for the dataset in API calls
locationOptional location property in API dataset resource

Key Takeaways

Use the bq mk command with --dataset and --location flags to create datasets.
Dataset IDs must be unique and contain only letters, numbers, and underscores.
Always specify the project ID to avoid creating datasets in the wrong project.
Check if a dataset already exists before creating to prevent errors.
You can create datasets via the BigQuery console, CLI, or API with similar parameters.