Connection vs Hook in Airflow: Key Differences and Usage
Connection stores credentials and configuration details for external systems, while a Hook is a Python interface that uses these connections to interact with those systems. Connections hold the data, and Hooks use that data to perform actions like querying databases or calling APIs.Quick Comparison
This table summarizes the main differences between Connection and Hook in Airflow.
| Aspect | Connection | Hook |
|---|---|---|
| Purpose | Stores credentials and config for external systems | Provides methods to interact with external systems using connections |
| Type | Data storage object in Airflow metadata | Python class with methods for operations |
| Contains | Host, login, password, port, extra info | Code to connect, query, or send data |
| Usage | Referenced by hooks/operators to get connection info | Used in operators to perform tasks |
| Example | MySQL connection with host and password | MySqlHook to run SQL queries |
| Scope | Static configuration | Dynamic interaction |
Key Differences
Connections in Airflow are like address books that store all the details needed to reach an external service, such as a database or API. They hold static information like usernames, passwords, hostnames, and ports. This information is stored securely in Airflow's metadata database and can be reused across multiple workflows.
Hooks, on the other hand, are Python classes that use these stored connection details to actually communicate with the external system. They contain the code to open connections, send queries, or perform actions. Hooks act as the bridge between Airflow and the external service, using the connection data to do real work.
In short, Connections hold the 'where' and 'how to access' info, while Hooks hold the 'what to do' code. Operators in Airflow typically use hooks to perform tasks, and hooks retrieve connection info to connect properly.
Code Comparison
Here is an example showing how a Connection is defined and then used by a Hook to run a query on a MySQL database.
from airflow.models import Connection from airflow.hooks.mysql_hook import MySqlHook # Example: Define a connection in Airflow UI or via CLI # conn_id = 'my_mysql_conn' # Host: 'localhost', Login: 'user', Password: 'pass', Schema: 'testdb' # Using MySqlHook to run a query mysql_hook = MySqlHook(mysql_conn_id='my_mysql_conn') result = mysql_hook.get_records(sql='SELECT * FROM my_table LIMIT 5') print(result)
Hook Equivalent
This example shows how the MySqlHook uses the connection details internally to connect and fetch data.
class MySqlHookExample: def __init__(self, conn_id): self.conn_id = conn_id self.connection = self.get_connection() def get_connection(self): # Simulate fetching connection info from Airflow metadata return { 'host': 'localhost', 'login': 'user', 'password': 'pass', 'schema': 'testdb' } def get_records(self, sql): # Simulate running SQL query using connection info print(f"Connecting to {self.connection['host']} as {self.connection['login']}") # Here would be the real DB call return [('row1_col1', 'row1_col2'), ('row2_col1', 'row2_col2')] # Usage hook = MySqlHookExample('my_mysql_conn') records = hook.get_records('SELECT * FROM my_table LIMIT 5') print(records)
When to Use Which
Choose Connection when you need to store or manage credentials and configuration for external systems centrally in Airflow. It is the place to keep your access details safe and reusable.
Choose Hook when you want to write or use code that interacts with an external system, such as running queries, sending data, or calling APIs. Hooks use connections to perform these actions dynamically during workflow execution.
In practice, you rarely use connections alone; hooks depend on connections. So, define connections first, then use hooks to do the work.