0
0
AirflowComparisonBeginner · 4 min read

Connection vs Hook in Airflow: Key Differences and Usage

In Airflow, a Connection stores credentials and configuration details for external systems, while a Hook is a Python interface that uses these connections to interact with those systems. Connections hold the data, and Hooks use that data to perform actions like querying databases or calling APIs.
⚖️

Quick Comparison

This table summarizes the main differences between Connection and Hook in Airflow.

AspectConnectionHook
PurposeStores credentials and config for external systemsProvides methods to interact with external systems using connections
TypeData storage object in Airflow metadataPython class with methods for operations
ContainsHost, login, password, port, extra infoCode to connect, query, or send data
UsageReferenced by hooks/operators to get connection infoUsed in operators to perform tasks
ExampleMySQL connection with host and passwordMySqlHook to run SQL queries
ScopeStatic configurationDynamic interaction
⚖️

Key Differences

Connections in Airflow are like address books that store all the details needed to reach an external service, such as a database or API. They hold static information like usernames, passwords, hostnames, and ports. This information is stored securely in Airflow's metadata database and can be reused across multiple workflows.

Hooks, on the other hand, are Python classes that use these stored connection details to actually communicate with the external system. They contain the code to open connections, send queries, or perform actions. Hooks act as the bridge between Airflow and the external service, using the connection data to do real work.

In short, Connections hold the 'where' and 'how to access' info, while Hooks hold the 'what to do' code. Operators in Airflow typically use hooks to perform tasks, and hooks retrieve connection info to connect properly.

⚖️

Code Comparison

Here is an example showing how a Connection is defined and then used by a Hook to run a query on a MySQL database.

python
from airflow.models import Connection
from airflow.hooks.mysql_hook import MySqlHook

# Example: Define a connection in Airflow UI or via CLI
# conn_id = 'my_mysql_conn'
# Host: 'localhost', Login: 'user', Password: 'pass', Schema: 'testdb'

# Using MySqlHook to run a query
mysql_hook = MySqlHook(mysql_conn_id='my_mysql_conn')
result = mysql_hook.get_records(sql='SELECT * FROM my_table LIMIT 5')
print(result)
Output
[('row1_col1', 'row1_col2'), ('row2_col1', 'row2_col2'), ...]
↔️

Hook Equivalent

This example shows how the MySqlHook uses the connection details internally to connect and fetch data.

python
class MySqlHookExample:
    def __init__(self, conn_id):
        self.conn_id = conn_id
        self.connection = self.get_connection()

    def get_connection(self):
        # Simulate fetching connection info from Airflow metadata
        return {
            'host': 'localhost',
            'login': 'user',
            'password': 'pass',
            'schema': 'testdb'
        }

    def get_records(self, sql):
        # Simulate running SQL query using connection info
        print(f"Connecting to {self.connection['host']} as {self.connection['login']}")
        # Here would be the real DB call
        return [('row1_col1', 'row1_col2'), ('row2_col1', 'row2_col2')]

# Usage
hook = MySqlHookExample('my_mysql_conn')
records = hook.get_records('SELECT * FROM my_table LIMIT 5')
print(records)
Output
Connecting to localhost as user [('row1_col1', 'row1_col2'), ('row2_col1', 'row2_col2')]
🎯

When to Use Which

Choose Connection when you need to store or manage credentials and configuration for external systems centrally in Airflow. It is the place to keep your access details safe and reusable.

Choose Hook when you want to write or use code that interacts with an external system, such as running queries, sending data, or calling APIs. Hooks use connections to perform these actions dynamically during workflow execution.

In practice, you rarely use connections alone; hooks depend on connections. So, define connections first, then use hooks to do the work.

Key Takeaways

Connections store static access details for external systems in Airflow.
Hooks are Python classes that use connections to interact with those systems.
Use connections to manage credentials securely and hooks to perform actions.
Operators use hooks, which internally fetch connection info to connect.
Define connections once; reuse them across multiple hooks and workflows.