Complete the code to create a BigQueryInsertJobOperator that runs a SQL query.
bq_query = BigQueryInsertJobOperator(
task_id='run_query',
configuration=[1]
)The configuration parameter for BigQueryInsertJobOperator must specify the job type. For running a SQL query, use the query configuration with useLegacySql set to False.
Complete the code to create a GCSToBigQueryOperator that loads data from GCS to BigQuery.
load_task = GCSToBigQueryOperator(
task_id='load_gcs_to_bq',
bucket='my_bucket',
source_objects=['data.csv'],
destination_project_dataset_table=[1],
write_disposition='WRITE_TRUNCATE'
)The destination_project_dataset_table parameter requires the full path including project, dataset, and table separated by dots.
Fix the error in the DataflowTemplateOperator code by completing the missing parameter.
dataflow_task = DataflowTemplateOperator(
task_id='run_dataflow',
template='gs://dataflow-templates/latest/Word_Count',
job_name='wordcount-job',
[1]={'inputFile': 'gs://my_bucket/input.txt', 'output': 'gs://my_bucket/output'},
location='us-central1'
)The parameters argument is used to pass runtime parameters to the Dataflow template.
Fill both blanks to create a BigQueryCreateEmptyTableOperator that creates a table with a schema.
create_table = BigQueryCreateEmptyTableOperator(
task_id='create_table',
dataset_id=[1],
table_id=[2],
schema_fields=[
{'name': 'name', 'type': 'STRING', 'mode': 'REQUIRED'},
{'name': 'age', 'type': 'INTEGER', 'mode': 'NULLABLE'}
]
)The dataset_id should be the dataset name, and table_id should be the table name to create.
Fill all three blanks to define a DataflowPythonOperator that runs a Python Dataflow job with arguments.
dataflow_python_task = DataflowPythonOperator(
task_id='run_python_dataflow',
py_file=[1],
options=[2],
py_options=[3]
)py_file is the path to the Python file in GCS, options is a dictionary of job arguments, and py_options is a list of Python options like ['-m'].