0
0
Apache Airflowdevops~10 mins

GCP operators (BigQuery, GCS, Dataflow) in Apache Airflow - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a BigQueryInsertJobOperator that runs a SQL query.

Apache Airflow
bq_query = BigQueryInsertJobOperator(
    task_id='run_query',
    configuration=[1]
)
Drag options to blanks, or click blank then click option'
A{'load': {'sourceUris': ['gs://bucket/file.csv'], 'destinationTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table'}}}
B{'copy': {'sourceTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table1'}, 'destinationTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table2'}}}
C{'extract': {'sourceTable': {'projectId': 'proj', 'datasetId': 'ds', 'tableId': 'table'}, 'destinationUris': ['gs://bucket/file.csv']}}
D{'query': {'query': 'SELECT * FROM dataset.table', 'useLegacySql': False}}
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'load' or 'extract' configurations instead of 'query' for running SQL.
Forgetting to set 'useLegacySql' to False for standard SQL.
2fill in blank
medium

Complete the code to create a GCSToBigQueryOperator that loads data from GCS to BigQuery.

Apache Airflow
load_task = GCSToBigQueryOperator(
    task_id='load_gcs_to_bq',
    bucket='my_bucket',
    source_objects=['data.csv'],
    destination_project_dataset_table=[1],
    write_disposition='WRITE_TRUNCATE'
)
Drag options to blanks, or click blank then click option'
A'table'
B'dataset.table'
C'project.dataset.table'
D'project.table'
Attempts:
3 left
💡 Hint
Common Mistakes
Using only dataset.table without project.
Using only table name without dataset and project.
3fill in blank
hard

Fix the error in the DataflowTemplateOperator code by completing the missing parameter.

Apache Airflow
dataflow_task = DataflowTemplateOperator(
    task_id='run_dataflow',
    template='gs://dataflow-templates/latest/Word_Count',
    job_name='wordcount-job',
    [1]={'inputFile': 'gs://my_bucket/input.txt', 'output': 'gs://my_bucket/output'},
    location='us-central1'
)
Drag options to blanks, or click blank then click option'
Aparameters
Boptions
Cconfig
Dargs
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'options' or 'args' instead of 'parameters'.
Omitting the parameters dictionary.
4fill in blank
hard

Fill both blanks to create a BigQueryCreateEmptyTableOperator that creates a table with a schema.

Apache Airflow
create_table = BigQueryCreateEmptyTableOperator(
    task_id='create_table',
    dataset_id=[1],
    table_id=[2],
    schema_fields=[
        {'name': 'name', 'type': 'STRING', 'mode': 'REQUIRED'},
        {'name': 'age', 'type': 'INTEGER', 'mode': 'NULLABLE'}
    ]
)
Drag options to blanks, or click blank then click option'
A'my_dataset'
B'users'
C'dataset1'
D'table1'
Attempts:
3 left
💡 Hint
Common Mistakes
Swapping dataset_id and table_id values.
Using project names instead of dataset or table names.
5fill in blank
hard

Fill all three blanks to define a DataflowPythonOperator that runs a Python Dataflow job with arguments.

Apache Airflow
dataflow_python_task = DataflowPythonOperator(
    task_id='run_python_dataflow',
    py_file=[1],
    options=[2],
    py_options=[3]
)
Drag options to blanks, or click blank then click option'
A'gs://my_bucket/dataflow_job.py'
B{'input': 'gs://my_bucket/input.txt', 'output': 'gs://my_bucket/output'}
C['-m']
D'dataflow_job.py'
Attempts:
3 left
💡 Hint
Common Mistakes
Using local file paths instead of GCS URIs for py_file.
Passing options as a list instead of a dictionary.
Omitting py_options or passing it as a string.