0
0
Apache Airflowdevops~10 mins

Connection management for cloud services in Apache Airflow - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Connection management for cloud services
Define Connection in Airflow UI
Store Credentials Securely
Use Connection ID in DAG
Airflow Retrieves Connection
Establish Cloud Service Session
Execute Tasks Using Connection
Close Connection After Use
This flow shows how Airflow manages cloud service connections: define, store, use in DAG, connect, run tasks, then close.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

with DAG('example_dag') as dag:
    task = BigQueryInsertJobOperator(
        task_id='bq_task',
        configuration={...},
        gcp_conn_id='my_gcp_conn'
    )
This DAG uses a connection ID 'my_gcp_conn' to connect to Google Cloud BigQuery and run a job.
Process Table
StepActionConnection ID UsedAirflow Internal StateResult
1DAG starts executionN/ADAG loaded, task readyReady to run task
2Task requests connectionmy_gcp_connFetch connection details from Airflow metadata DBConnection details retrieved
3Airflow creates client sessionmy_gcp_connSession object created with credentialsSession ready for API calls
4Task executes cloud API callmy_gcp_connSession used to send requestCloud job started successfully
5Task completesmy_gcp_connSession closedResources freed
6DAG run endsN/AAll tasks finishedDAG run successful
💡 All tasks completed and connections closed properly
Status Tracker
VariableStartAfter Step 2After Step 3After Step 5Final
connection_detailsNoneCredentials and config loadedSameSameNone (released)
sessionNoneNoneClient session object createdSession closedNone
task_statusNot startedWaiting for connectionRunningCompletedSuccess
Key Moments - 3 Insights
Why do we use a connection ID instead of hardcoding credentials in the DAG?
Using a connection ID lets Airflow securely store and manage credentials centrally, as shown in execution_table step 2, avoiding exposure in code.
What happens if the connection details are incorrect or missing?
At step 2 in the execution_table, Airflow fails to retrieve valid connection details, causing the task to error out before creating a session.
When is the cloud service session closed?
As seen in step 5, after the task completes, Airflow closes the session to free resources and avoid leaks.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step does Airflow create the client session?
AStep 2
BStep 4
CStep 3
DStep 5
💡 Hint
Check the 'Airflow Internal State' column for 'Session object created'
According to variable_tracker, what is the state of 'session' after step 5?
ANone (released)
BClient session object created
CCredentials and config loaded
DRunning
💡 Hint
Look at the 'session' row under 'After Step 5' column
If the connection ID is changed in the DAG, which step in execution_table is directly affected?
AStep 1
BStep 2
CStep 4
DStep 6
💡 Hint
Step 2 shows fetching connection details using the connection ID
Concept Snapshot
Airflow connection management:
- Define connections in Airflow UI with IDs
- Use connection ID in DAG operators
- Airflow fetches credentials at runtime
- Creates and uses session for cloud API calls
- Closes session after task completes
- Keeps credentials secure and reusable
Full Transcript
In Airflow, connection management for cloud services means defining connection details like credentials in the Airflow UI. These connections get stored securely in Airflow's metadata database. When a DAG runs, tasks use a connection ID to ask Airflow for the credentials. Airflow then creates a client session to the cloud service using those credentials. The task uses this session to perform cloud operations like starting a BigQuery job. After the task finishes, Airflow closes the session to free resources. This process keeps credentials safe and makes it easy to reuse connections across many DAGs.