0
0
Apache Airflowdevops~20 mins

Idempotent task design in Apache Airflow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Idempotent Airflow Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Why is idempotency important in Airflow tasks?

In Airflow, tasks may run multiple times due to retries or manual triggers. Why is designing tasks to be idempotent important?

ATo prevent Airflow from scheduling tasks automatically
BTo make tasks run faster by skipping all processing
CTo allow tasks to run only once ever without retries
DTo ensure tasks produce the same result and avoid side effects when run multiple times
Attempts:
2 left
💡 Hint

Think about what happens if a task runs more than once and changes data each time.

💻 Command Output
intermediate
1:30remaining
Output of an Airflow task with idempotent file creation

Consider an Airflow PythonOperator task that creates a file only if it does not exist. What will be the output if the task runs twice?

Apache Airflow
import os
file_path = '/tmp/data.txt'
if not os.path.exists(file_path):
    with open(file_path, 'w') as f:
        f.write('Airflow data')
print('File created or already exists')
A
File created
File not created
B
File created or already exists
File created or already exists
C
File created
File created
DFile not created
Attempts:
2 left
💡 Hint

Check what happens when the file already exists on the second run.

Configuration
advanced
2:00remaining
Idempotent database update in Airflow task

You want an Airflow task to update a database record only if a specific condition is met, avoiding duplicate updates on retries. Which SQL statement ensures idempotency?

AUPDATE users SET status='active' WHERE id=123 AND status!='active';
BUPDATE users SET status='active' WHERE id=123;
CINSERT INTO users (id, status) VALUES (123, 'active');
DDELETE FROM users WHERE id=123;
Attempts:
2 left
💡 Hint

Think about how to avoid changing the record if it already has the desired status.

Troubleshoot
advanced
2:00remaining
Troubleshooting non-idempotent Airflow task causing duplicate data

An Airflow task inserts rows into a table every time it runs, causing duplicates on retries. What is the best way to fix this?

AModify the task to check if the row exists before inserting
BIncrease the retry delay to avoid duplicates
CDisable retries for the task
DRun the task only once manually
Attempts:
2 left
💡 Hint

How can you prevent inserting the same data multiple times?

🔀 Workflow
expert
3:00remaining
Designing an idempotent Airflow DAG with external API calls

You have an Airflow DAG that calls an external API to create resources. The API does not support idempotency keys. How can you design the DAG to avoid creating duplicate resources on retries?

AIgnore failures and manually clean duplicates later
BIncrease the task timeout to avoid retries
CStore API response IDs in XCom and check before retrying to skip duplicate calls
DUse multiple tasks to call the API in parallel
Attempts:
2 left
💡 Hint

Think about how to remember what was created in previous attempts.