0
0
Apache Airflowdevops~10 mins

GCP operators (BigQuery, GCS, Dataflow) in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - GCP operators (BigQuery, GCS, Dataflow)
Start DAG
Trigger GCS Operator
Trigger BigQuery Operator
Trigger Dataflow Operator
Check Task Status
Complete DAG
The DAG starts, triggers GCS, then BigQuery, then Dataflow operators in order, checks their status, and completes.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator
from airflow.providers.google.cloud.operators.dataflow import DataflowStartFlexTemplateOperator

with DAG('gcp_operators_dag', schedule_interval=None, start_date=None, catchup=False) as dag:
    create_bucket = GCSCreateBucketOperator(task_id='create_bucket', bucket_name='my-bucket')
    run_bq = BigQueryInsertJobOperator(task_id='run_bq', configuration={"query": {"query": "SELECT 1", "useLegacySql": False}})
    start_dataflow = DataflowStartFlexTemplateOperator(task_id='start_dataflow', body={"launchParameter": {"jobName": "my-job", "containerSpecGcsPath": "gs://dataflow-templates/latest/flex/Word_Count"}})

    create_bucket >> run_bq >> start_dataflow
This Airflow DAG creates a GCS bucket, runs a BigQuery query, then starts a Dataflow job in sequence.
Process Table
StepTask TriggeredActionStatusNext Task
1create_bucketCreate GCS bucket 'my-bucket'Successrun_bq
2run_bqRun BigQuery SQL query 'SELECT 1'Successstart_dataflow
3start_dataflowStart Dataflow job 'my-job' with flex templateSuccessNone
4DAGAll tasks completed successfullySuccessEnd
💡 All tasks executed successfully in order, DAG run completes.
Status Tracker
VariableStartAfter create_bucketAfter run_bqAfter start_dataflow
GCS BucketNonemy-bucket createdmy-bucket existsmy-bucket exists
BigQuery JobNoneNoneQuery job completedQuery job completed
Dataflow JobNoneNoneNoneDataflow job started
Key Moments - 3 Insights
Why does the Dataflow operator run only after BigQuery finishes?
Because the DAG defines dependencies with 'create_bucket >> run_bq >> start_dataflow', so each task waits for the previous to succeed (see execution_table steps 2 and 3).
What happens if the GCS bucket creation fails?
The DAG stops and does not run BigQuery or Dataflow tasks, since downstream tasks depend on the success of previous ones (see execution_table step 1).
How does Airflow know the status of each GCP operator?
Each operator reports success or failure after execution, which Airflow tracks to decide whether to proceed (see 'Status' column in execution_table).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the status of the BigQuery task at step 2?
AFailed
BSuccess
CRunning
DSkipped
💡 Hint
Check the 'Status' column for step 2 in the execution_table.
At which step does the Dataflow job start according to the execution_table?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look for 'start_dataflow' task in the 'Task Triggered' column.
If the GCS bucket creation failed, what would happen to the BigQuery task?
AIt would be skipped
BIt would run anyway
CIt would run after Dataflow
DIt would retry immediately
💡 Hint
Refer to key_moments about task dependencies and failure impact.
Concept Snapshot
GCP Operators in Airflow:
- GCSCreateBucketOperator creates buckets
- BigQueryInsertJobOperator runs SQL queries
- DataflowStartFlexTemplateOperator starts Dataflow jobs
- Define task order with >> to set dependencies
- Airflow tracks success/failure to control flow
Full Transcript
This visual execution shows an Airflow DAG using GCP operators: first creating a GCS bucket, then running a BigQuery query, and finally starting a Dataflow job. Each task waits for the previous to succeed before running. The execution table tracks each step's action and status. Variables like bucket existence and job completion update after each task. Key moments clarify why tasks run in order and what happens on failure. The quiz tests understanding of task status and dependencies. This helps beginners see how Airflow manages GCP tasks step-by-step.