0
0
GcpHow-ToBeginner · 4 min read

How to Use BigQuery ML: Simple Guide with Examples

Use CREATE MODEL in BigQuery ML to build machine learning models with SQL. Then, use ML.PREDICT to get predictions from your model directly in BigQuery without moving data.
📐

Syntax

The basic syntax to create a model in BigQuery ML is:

  • CREATE MODEL: starts the model creation.
  • model_name: the name you give your model.
  • OPTIONS(): settings like model type and input features.
  • AS SELECT: the SQL query that selects training data.

To predict, use ML.PREDICT with your model and input data.

sql
CREATE MODEL `project.dataset.model_name`
OPTIONS(
  model_type='linear_reg',
  input_label_cols=['target_column']
) AS
SELECT
  feature1,
  feature2,
  target_column
FROM
  `project.dataset.training_table`;
💻

Example

This example creates a linear regression model to predict a target value using two features, then shows how to get predictions.

sql
CREATE MODEL `mydataset.my_linear_model`
OPTIONS(
  model_type='linear_reg',
  input_label_cols=['target']
) AS
SELECT
  feature1,
  feature2,
  target
FROM
  `mydataset.training_data`;

SELECT * FROM ML.PREDICT(MODEL `mydataset.my_linear_model`,
  (SELECT feature1, feature2 FROM `mydataset.new_data`));
Output
feature1 | feature2 | predicted_target -------- | -------- | ---------------- 10 | 5 | 23.4 7 | 3 | 17.8 12 | 6 | 26.1
⚠️

Common Pitfalls

Common mistakes include:

  • Not specifying input_label_cols for supervised models causes errors.
  • Using unsupported model types or wrong options.
  • Training data with missing or null values can cause model failure.
  • Trying to predict with features not in the training data.

Always check your data and options before running.

sql
/* Wrong: Missing input_label_cols */
CREATE MODEL `mydataset.bad_model`
OPTIONS(
  model_type='linear_reg'
) AS
SELECT feature1, feature2, target FROM `mydataset.training_data`;

/* Right: Include input_label_cols */
CREATE MODEL `mydataset.good_model`
OPTIONS(
  model_type='linear_reg',
  input_label_cols=['target']
) AS
SELECT feature1, feature2, target FROM `mydataset.training_data`;
📊

Quick Reference

CommandPurposeExample
CREATE MODELBuilds a machine learning modelCREATE MODEL `dataset.model` OPTIONS(model_type='linear_reg', input_label_cols=['target']) AS SELECT * FROM `dataset.table`
ML.PREDICTGenerates predictions from a modelSELECT * FROM ML.PREDICT(MODEL `dataset.model`, (SELECT * FROM `dataset.new_data`))
DROP MODELDeletes a modelDROP MODEL `dataset.model`
SHOW MODELSLists models in a datasetSHOW MODELS IN `dataset`

Key Takeaways

Use SQL commands like CREATE MODEL and ML.PREDICT to build and use models directly in BigQuery.
Always specify the label column with input_label_cols for supervised learning models.
Check your training data for missing values and matching feature columns.
BigQuery ML supports multiple model types like linear regression and classification.
You can manage models with commands like DROP MODEL and SHOW MODELS.