Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to create a BigQueryInsertJobOperator that runs a SQL query.

Apache Airflow

bq_query = BigQueryInsertJobOperator(
    task_id='run_query',
    configuration=[1]
)

Drag options to blanks, or click blank then click option'

A{'load': {'sourceUris': ['gs://bucket/file.csv'], 'destinationTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table'}}}

B{'copy': {'sourceTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table1'}, 'destinationTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table2'}}}

C{'extract': {'sourceTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table'}, 'destinationUris': ['gs://bucket/file.csv']}}

D{'query': {'query': 'SELECT * FROM dataset.table', 'useLegacySql': False}}

Attempts:

3 left

2fill in blank

medium

Complete the code to create a GCSToBigQueryOperator that loads data from GCS to BigQuery.

Apache Airflow

load_task = GCSToBigQueryOperator(
    task_id='load_gcs_to_bq',
    bucket='my_bucket',
    source_objects=['data.csv'],
    destination_project_dataset_table=[1],
    write_disposition='WRITE_TRUNCATE'
)

Drag options to blanks, or click blank then click option'

A'table'

B'dataset.table'

C'project.dataset.table'

D'project.table'

Attempts:

3 left

3fill in blank

hard

Fix the error in the DataflowTemplateOperator code by completing the missing parameter.

Apache Airflow

dataflow_task = DataflowTemplateOperator(
    task_id='run_dataflow',
    template='gs://dataflow-templates/latest/Word_Count',
    job_name='wordcount-job',
    [1]={'inputFile': 'gs://my_bucket/input.txt', 'output': 'gs://my_bucket/output'},
    location='us-central1'
)

Drag options to blanks, or click blank then click option'

Aparameters

Boptions

Cconfig

Dargs

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a BigQueryCreateEmptyTableOperator that creates a table with a schema.

Apache Airflow

create_table = BigQueryCreateEmptyTableOperator(
    task_id='create_table',
    dataset_id=[1],
    table_id=[2],
    schema_fields=[
        {'name': 'name', 'type': 'STRING', 'mode': 'REQUIRED'},
        {'name': 'age', 'type': 'INTEGER', 'mode': 'NULLABLE'}
    ]
)

Drag options to blanks, or click blank then click option'

A'my_dataset'

B'users'

C'dataset1'

D'table1'

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to define a DataflowPythonOperator that runs a Python Dataflow job with arguments.

Apache Airflow

dataflow_python_task = DataflowPythonOperator(
    task_id='run_python_dataflow',
    py_file=[1],
    options=[2],
    py_options=[3]
)

Drag options to blanks, or click blank then click option'

A'gs://my_bucket/dataflow_job.py'

B{'input': 'gs://my_bucket/input.txt', 'output': 'gs://my_bucket/output'}

C['-m']

D'dataflow_job.py'

Attempts:

3 left