What is Connection in Airflow: Explanation and Example
connection is a stored set of credentials and parameters that allow Airflow to securely connect to external systems like databases, cloud services, or APIs. It acts like a saved login profile that tasks can use to access resources without hardcoding sensitive information.How It Works
Think of an Airflow connection as a keychain that holds all the keys (credentials) you need to open different doors (external systems). Instead of writing your username, password, host, and other details directly in your workflow code, you save them once in Airflow's connection manager. This way, your tasks can simply ask Airflow for the right key when they need to connect.
Airflow stores these connections securely and makes them easy to manage through its UI or command line. When a task runs, it references the connection by its unique ID, and Airflow provides the necessary details behind the scenes. This keeps your workflows clean, secure, and easier to maintain.
Example
This example shows how to define a connection in Airflow and use it in a PythonOperator to connect to a PostgreSQL database.
from airflow import DAG from airflow.operators.python import PythonOperator from airflow.hooks.base import BaseHook from datetime import datetime import psycopg2 def query_postgres(): # Get connection details by connection ID conn = BaseHook.get_connection('my_postgres_conn') # Connect to PostgreSQL using psycopg2 connection = psycopg2.connect( host=conn.host, user=conn.login, password=conn.password, dbname=conn.schema, port=conn.port ) cursor = connection.cursor() cursor.execute('SELECT version();') result = cursor.fetchone() print(f"PostgreSQL version: {result[0]}") cursor.close() connection.close() with DAG('example_connection_dag', start_date=datetime(2024, 1, 1), schedule_interval='@once', catchup=False) as dag: task = PythonOperator( task_id='query_postgres', python_callable=query_postgres )
When to Use
Use Airflow connections whenever your workflows need to interact with external systems like databases, cloud platforms, message queues, or APIs. They help you avoid putting sensitive information directly in your code, making your workflows safer and easier to update.
For example, if you have a task that reads data from a MySQL database or uploads files to Amazon S3, you create a connection with the required credentials once. Then, any task can use that connection by its ID, simplifying maintenance and improving security.
Key Points
- Connections store credentials and parameters for external systems.
- They keep sensitive info out of your code.
- Airflow provides a UI and CLI to manage connections.
- Tasks reference connections by their unique IDs.
- Using connections improves security and maintainability.