Creating custom operators in Apache Airflow - Performance & Efficiency
When we create custom operators in Airflow, we want to know how their work time changes as the tasks they handle grow.
We ask: How does the time to run the operator change when it processes more data or steps?
Analyze the time complexity of the following custom operator code snippet.
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults
class MyCustomOperator(BaseOperator):
@apply_defaults
def __init__(self, items, *args, **kwargs):
super().__init__(*args, **kwargs)
self.items = items
def execute(self, context):
for item in self.items:
self.process_item(item)
def process_item(self, item):
# Simulate processing
pass
This operator processes a list of items one by one during execution.
Look for loops or repeated steps in the code.
- Primary operation: The
forloop inexecutethat processes each item. - How many times: Once for every item in the
itemslist.
As the number of items grows, the operator runs process_item more times.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 calls to process_item |
| 100 | 100 calls to process_item |
| 1000 | 1000 calls to process_item |
Pattern observation: The work grows directly with the number of items; doubling items doubles work.
Time Complexity: O(n)
This means the time to run the operator grows in a straight line with the number of items it processes.
[X] Wrong: "The operator runs in constant time no matter how many items there are."
[OK] Correct: Because the operator loops through each item, more items mean more work and more time.
Understanding how your custom operator scales helps you design efficient workflows and shows you can think about performance in real projects.
"What if process_item itself had a loop over a list inside it? How would the time complexity change?"