Challenge - 5 Problems

🎖️

GCP Operators Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ service_behavior

intermediate

2:00remaining

BigQueryInsertJobOperator Behavior on Query Execution

Given the following Airflow DAG snippet using BigQueryInsertJobOperator, what will be the value of task_instance.xcom_pull(task_ids='run_query') after successful execution?

Apache Airflow

from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

run_query = BigQueryInsertJobOperator(
    task_id='run_query',
    configuration={
        'query': {
            'query': 'SELECT COUNT(*) as total FROM `my_dataset.my_table`',
            'useLegacySql': False
        }
    }
)

AThe integer count of rows returned by the query.

BA list of dictionaries representing the query result rows.

CNone, because BigQueryInsertJobOperator does not push any XCom value.

DA string containing the job ID of the executed query.

Attempts:

2 left

❓ Configuration

intermediate

2:00remaining

GCSCreateBucketOperator Configuration

Which of the following configurations correctly creates a new Google Cloud Storage bucket named my-new-bucket in the us-central1 location with standard storage class using GCSCreateBucketOperator?

Apache Airflow

from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator

create_bucket = GCSCreateBucketOperator(
    task_id='create_bucket',
    bucket_name='my-new-bucket',
    location='us-central1',
    storage_class='standard'
)

AGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', storage_class='standard')

BGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', location='us-central1', storage_class='multi-regional')

CGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', location='us-central1', storage_class='standard')

DGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', location='us-central1')

Attempts:

2 left

❓ Architecture

advanced

2:30remaining

Dataflow Template Execution with Airflow

You want to run a Dataflow job using a pre-built template with Airflow's DataflowTemplatedJobStartOperator. Which configuration correctly specifies the job name, template path, and parameters to launch a streaming job?

Apache Airflow

from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator

start_dataflow = DataflowTemplatedJobStartOperator(
    task_id='start_dataflow',
    job_name='streaming-job-001',
    template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery',
    parameters={
        'inputFilePattern': 'gs://my-bucket/input/*.txt',
        'outputTable': 'my-project:dataset.table'
    },
    location='us-central1'
)

Ajob_name='streaming-job-001', template='gs://dataflow-templates/latest/Batch_GCS_Text_to_BigQuery', parameters={'inputFilePattern': 'gs://my-bucket/input/*.txt', 'outputTable': 'my-project:dataset.table'}, location='us-central1'

Bjob_name='streaming-job-001', template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery', parameters={'inputFilePattern': 'gs://my-bucket/input/*.txt', 'outputTable': 'my-project:dataset.table'}, location='us-central1'

Cjob_name='streaming-job-001', template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery', parameters={'inputFile': 'gs://my-bucket/input/*.txt', 'outputTable': 'my-project:dataset.table'}, location='us-central1'

Djob_name='streaming-job-001', template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery', parameters={'inputFilePattern': 'gs://my-bucket/input/*.txt'}, location='us-central1'

Attempts:

2 left

❓ security

advanced

2:00remaining

Service Account Permissions for BigQueryOperator

Which minimum IAM role must the service account have to successfully run a BigQueryInsertJobOperator that executes queries and creates tables in BigQuery?

Aroles/bigquery.dataEditor

Broles/bigquery.user

Croles/bigquery.jobUser

Droles/bigquery.admin

Attempts:

2 left

✅ Best Practice

expert

3:00remaining

Optimizing Dataflow Job Launch in Airflow for Cost and Reliability

You want to launch multiple Dataflow jobs from Airflow using DataflowTemplatedJobStartOperator. To optimize cost and ensure reliability, which approach is best?

AUse Airflow pools to limit concurrent Dataflow jobs and add retries with exponential backoff.

BUse a single Dataflow job to process all data instead of multiple jobs to reduce complexity.

CRun Dataflow jobs sequentially with no retries to avoid overlapping resource usage.

DLaunch all Dataflow jobs in parallel without dependencies to minimize total runtime.

Attempts:

2 left