0
0
Apache Airflowdevops~20 mins

GCP operators (BigQuery, GCS, Dataflow) in Apache Airflow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
GCP Operators Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
service_behavior
intermediate
2:00remaining
BigQueryInsertJobOperator Behavior on Query Execution
Given the following Airflow DAG snippet using BigQueryInsertJobOperator, what will be the value of task_instance.xcom_pull(task_ids='run_query') after successful execution?
Apache Airflow
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

run_query = BigQueryInsertJobOperator(
    task_id='run_query',
    configuration={
        'query': {
            'query': 'SELECT COUNT(*) as total FROM `my_dataset.my_table`',
            'useLegacySql': False
        }
    }
)
AThe integer count of rows returned by the query.
BA list of dictionaries representing the query result rows.
CNone, because BigQueryInsertJobOperator does not push any XCom value.
DA string containing the job ID of the executed query.
Attempts:
2 left
💡 Hint
Think about what BigQueryInsertJobOperator returns by default after running a query job.
Configuration
intermediate
2:00remaining
GCSCreateBucketOperator Configuration
Which of the following configurations correctly creates a new Google Cloud Storage bucket named my-new-bucket in the us-central1 location with standard storage class using GCSCreateBucketOperator?
Apache Airflow
from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator

create_bucket = GCSCreateBucketOperator(
    task_id='create_bucket',
    bucket_name='my-new-bucket',
    location='us-central1',
    storage_class='standard'
)
AGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', storage_class='standard')
BGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', location='us-central1', storage_class='multi-regional')
CGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', location='us-central1', storage_class='standard')
DGCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-new-bucket', location='us-central1')
Attempts:
2 left
💡 Hint
Check which parameters are required to specify location and storage class explicitly.
Architecture
advanced
2:30remaining
Dataflow Template Execution with Airflow
You want to run a Dataflow job using a pre-built template with Airflow's DataflowTemplatedJobStartOperator. Which configuration correctly specifies the job name, template path, and parameters to launch a streaming job?
Apache Airflow
from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator

start_dataflow = DataflowTemplatedJobStartOperator(
    task_id='start_dataflow',
    job_name='streaming-job-001',
    template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery',
    parameters={
        'inputFilePattern': 'gs://my-bucket/input/*.txt',
        'outputTable': 'my-project:dataset.table'
    },
    location='us-central1'
)
Ajob_name='streaming-job-001', template='gs://dataflow-templates/latest/Batch_GCS_Text_to_BigQuery', parameters={'inputFilePattern': 'gs://my-bucket/input/*.txt', 'outputTable': 'my-project:dataset.table'}, location='us-central1'
Bjob_name='streaming-job-001', template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery', parameters={'inputFilePattern': 'gs://my-bucket/input/*.txt', 'outputTable': 'my-project:dataset.table'}, location='us-central1'
Cjob_name='streaming-job-001', template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery', parameters={'inputFile': 'gs://my-bucket/input/*.txt', 'outputTable': 'my-project:dataset.table'}, location='us-central1'
Djob_name='streaming-job-001', template='gs://dataflow-templates/latest/Stream_GCS_Text_to_BigQuery', parameters={'inputFilePattern': 'gs://my-bucket/input/*.txt'}, location='us-central1'
Attempts:
2 left
💡 Hint
Check the template name and parameter keys carefully for streaming jobs.
security
advanced
2:00remaining
Service Account Permissions for BigQueryOperator
Which minimum IAM role must the service account have to successfully run a BigQueryInsertJobOperator that executes queries and creates tables in BigQuery?
Aroles/bigquery.dataEditor
Broles/bigquery.user
Croles/bigquery.jobUser
Droles/bigquery.admin
Attempts:
2 left
💡 Hint
Consider the permissions needed to run queries and create tables, not just submit jobs.
Best Practice
expert
3:00remaining
Optimizing Dataflow Job Launch in Airflow for Cost and Reliability
You want to launch multiple Dataflow jobs from Airflow using DataflowTemplatedJobStartOperator. To optimize cost and ensure reliability, which approach is best?
AUse Airflow pools to limit concurrent Dataflow jobs and add retries with exponential backoff.
BUse a single Dataflow job to process all data instead of multiple jobs to reduce complexity.
CRun Dataflow jobs sequentially with no retries to avoid overlapping resource usage.
DLaunch all Dataflow jobs in parallel without dependencies to minimize total runtime.
Attempts:
2 left
💡 Hint
Think about balancing resource usage, cost, and job failure handling.