Audit logging in Apache Airflow - Time & Space Complexity
Audit logging in Airflow tracks actions like task runs and user changes. Understanding time complexity helps us see how logging effort grows as more events happen.
We want to know how the work of recording logs changes when the number of events increases.
Analyze the time complexity of the following Airflow audit logging snippet.
from airflow.models import TaskInstance
from airflow.utils.session import provide_session
@provide_session
def log_task_events(task_instances: list[TaskInstance], session=None):
for ti in task_instances:
session.add(create_audit_log(ti))
session.commit()
# create_audit_log creates a log record for a task instance
This code logs audit entries for a list of task instances by adding each log to the database session and then committing once.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Loop over each task instance to create and add a log entry.
- How many times: Once per task instance in the input list.
Each new task instance adds one more log entry to create and add. So, the work grows directly with the number of task instances.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 log entries created and added |
| 100 | 100 log entries created and added |
| 1000 | 1000 log entries created and added |
Pattern observation: The work increases steadily and directly with the number of task instances.
Time Complexity: O(n)
This means the time to log audit entries grows in a straight line as the number of task instances grows.
[X] Wrong: "Audit logging time stays the same no matter how many tasks we log."
[OK] Correct: Each task needs its own log entry, so more tasks mean more work to create and store logs.
Understanding how audit logging scales helps you design systems that keep track of actions efficiently as they grow. This skill shows you can think about real-world system behavior.
"What if we logged audit entries asynchronously in batches? How would the time complexity change?"