0
0
AirflowConceptBeginner · 3 min read

What is Connection in Airflow: Explanation and Example

In Apache Airflow, a connection is a stored set of credentials and parameters that allow Airflow to securely connect to external systems like databases, cloud services, or APIs. It acts like a saved login profile that tasks can use to access resources without hardcoding sensitive information.
⚙️

How It Works

Think of an Airflow connection as a keychain that holds all the keys (credentials) you need to open different doors (external systems). Instead of writing your username, password, host, and other details directly in your workflow code, you save them once in Airflow's connection manager. This way, your tasks can simply ask Airflow for the right key when they need to connect.

Airflow stores these connections securely and makes them easy to manage through its UI or command line. When a task runs, it references the connection by its unique ID, and Airflow provides the necessary details behind the scenes. This keeps your workflows clean, secure, and easier to maintain.

💻

Example

This example shows how to define a connection in Airflow and use it in a PythonOperator to connect to a PostgreSQL database.

python
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.hooks.base import BaseHook
from datetime import datetime
import psycopg2

def query_postgres():
    # Get connection details by connection ID
    conn = BaseHook.get_connection('my_postgres_conn')
    # Connect to PostgreSQL using psycopg2
    connection = psycopg2.connect(
        host=conn.host,
        user=conn.login,
        password=conn.password,
        dbname=conn.schema,
        port=conn.port
    )
    cursor = connection.cursor()
    cursor.execute('SELECT version();')
    result = cursor.fetchone()
    print(f"PostgreSQL version: {result[0]}")
    cursor.close()
    connection.close()

with DAG('example_connection_dag', start_date=datetime(2024, 1, 1), schedule_interval='@once', catchup=False) as dag:
    task = PythonOperator(
        task_id='query_postgres',
        python_callable=query_postgres
    )
Output
PostgreSQL version: PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.2.0, 64-bit
🎯

When to Use

Use Airflow connections whenever your workflows need to interact with external systems like databases, cloud platforms, message queues, or APIs. They help you avoid putting sensitive information directly in your code, making your workflows safer and easier to update.

For example, if you have a task that reads data from a MySQL database or uploads files to Amazon S3, you create a connection with the required credentials once. Then, any task can use that connection by its ID, simplifying maintenance and improving security.

Key Points

  • Connections store credentials and parameters for external systems.
  • They keep sensitive info out of your code.
  • Airflow provides a UI and CLI to manage connections.
  • Tasks reference connections by their unique IDs.
  • Using connections improves security and maintainability.

Key Takeaways

Airflow connections securely store credentials for external systems.
They prevent hardcoding sensitive data in workflow code.
Connections are referenced by unique IDs in tasks.
Use connections to simplify and secure integrations.
Manage connections easily via Airflow UI or CLI.