Challenge - 5 Problems
GCP Operators Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ service_behavior
intermediate2:00remaining
BigQueryInsertJobOperator Behavior on Query Execution
Given the following Airflow DAG snippet using BigQueryInsertJobOperator, what will be the value of
task_instance.xcom_pull(task_ids='run_query') after successful execution?Apache Airflow
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator run_query = BigQueryInsertJobOperator( task_id='run_query', configuration={ 'query': { 'query': 'SELECT COUNT(*) as total FROM `my_dataset.my_table`', 'useLegacySql': False } } )
Attempts:
2 left
💡 Hint
Think about what BigQueryInsertJobOperator returns by default after running a query job.
✗ Incorrect
BigQueryInsertJobOperator pushes the job ID string as XCom after execution, not the query results themselves.
❓ Configuration
intermediate2:00remaining
GCSCreateBucketOperator Configuration
Which of the following configurations correctly creates a new Google Cloud Storage bucket named
my-new-bucket in the us-central1 location with standard storage class using GCSCreateBucketOperator?Apache Airflow
from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator create_bucket = GCSCreateBucketOperator( task_id='create_bucket', bucket_name='my-new-bucket', location='us-central1', storage_class='standard' )
Attempts:
2 left
💡 Hint
Check which parameters are required to specify location and storage class explicitly.
✗ Incorrect
Option C correctly sets both location and storage class explicitly. Other options either omit required parameters or use wrong storage class.
❓ Architecture
advanced2:30remaining
Dataflow Template Execution with Airflow
You want to run a Dataflow job using a pre-built template with Airflow's DataflowTemplatedJobStartOperator. Which configuration correctly specifies the job name, template path, and parameters to launch a streaming job?
Apache Airflow
from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator start_dataflow = DataflowTemplatedJobStartOperator( task_id='start_dataflow', job_name='streaming-job-001', template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery', parameters={ 'inputFilePattern': 'gs://my-bucket/input/*.txt', 'outputTable': 'my-project:dataset.table' }, location='us-central1' )
Attempts:
2 left
💡 Hint
Check the template name and parameter keys carefully for streaming jobs.
✗ Incorrect
Option B uses the correct streaming template and parameter keys. Option B uses wrong parameter key, Option B uses a batch template, Option B misses required outputTable parameter.
❓ security
advanced2:00remaining
Service Account Permissions for BigQueryOperator
Which minimum IAM role must the service account have to successfully run a BigQueryInsertJobOperator that executes queries and creates tables in BigQuery?
Attempts:
2 left
💡 Hint
Consider the permissions needed to run queries and create tables, not just submit jobs.
✗ Incorrect
roles/bigquery.dataEditor allows running queries and creating tables. roles/bigquery.jobUser only allows submitting jobs but not modifying data. roles/bigquery.user is more limited. roles/bigquery.admin is more than minimum required.
✅ Best Practice
expert3:00remaining
Optimizing Dataflow Job Launch in Airflow for Cost and Reliability
You want to launch multiple Dataflow jobs from Airflow using DataflowTemplatedJobStartOperator. To optimize cost and ensure reliability, which approach is best?
Attempts:
2 left
💡 Hint
Think about balancing resource usage, cost, and job failure handling.
✗ Incorrect
Using Airflow pools limits concurrent jobs to control cost and resource usage. Adding retries with backoff improves reliability. Launching all in parallel may cause resource contention and higher cost. Sequential runs increase total time. Single job may not fit all use cases.