Challenge - 5 Problems

🎖️

Idempotent Airflow Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Why is idempotency important in Airflow tasks?

In Airflow, tasks may run multiple times due to retries or manual triggers. Why is designing tasks to be idempotent important?

ATo prevent Airflow from scheduling tasks automatically

BTo make tasks run faster by skipping all processing

CTo allow tasks to run only once ever without retries

DTo ensure tasks produce the same result and avoid side effects when run multiple times

Attempts:

2 left

💻 Command Output

intermediate

1:30remaining

Output of an Airflow task with idempotent file creation

Consider an Airflow PythonOperator task that creates a file only if it does not exist. What will be the output if the task runs twice?

Apache Airflow

import os
file_path = '/tmp/data.txt'
if not os.path.exists(file_path):
    with open(file_path, 'w') as f:
        f.write('Airflow data')
print('File created or already exists')

File created
File not created

File created or already exists
File created or already exists

File created
File created

DFile not created

Attempts:

2 left

❓ Configuration

advanced

2:00remaining

Idempotent database update in Airflow task

You want an Airflow task to update a database record only if a specific condition is met, avoiding duplicate updates on retries. Which SQL statement ensures idempotency?

AUPDATE users SET status='active' WHERE id=123 AND status!='active';

BUPDATE users SET status='active' WHERE id=123;

CINSERT INTO users (id, status) VALUES (123, 'active');

DDELETE FROM users WHERE id=123;

Attempts:

2 left

❓ Troubleshoot

advanced

2:00remaining

Troubleshooting non-idempotent Airflow task causing duplicate data

An Airflow task inserts rows into a table every time it runs, causing duplicates on retries. What is the best way to fix this?

AModify the task to check if the row exists before inserting

BIncrease the retry delay to avoid duplicates

CDisable retries for the task

DRun the task only once manually

Attempts:

2 left

🔀 Workflow

expert

3:00remaining

Designing an idempotent Airflow DAG with external API calls

You have an Airflow DAG that calls an external API to create resources. The API does not support idempotency keys. How can you design the DAG to avoid creating duplicate resources on retries?

AIgnore failures and manually clean duplicates later

BIncrease the task timeout to avoid retries

CStore API response IDs in XCom and check before retrying to skip duplicate calls

DUse multiple tasks to call the API in parallel

Attempts:

2 left