Challenge - 5 Problems

🎖️

Schema Mastery in Airflow

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why use schema validation in Airflow data pipelines?

In data pipelines managed by Airflow, why is it important to validate schema changes before processing data?

ATo prevent pipeline failures caused by unexpected data format changes

BTo speed up the execution of Airflow DAGs by skipping data checks

CTo automatically fix data errors without human intervention

DTo reduce the number of tasks in the DAG by merging schema checks with data loading

Attempts:

2 left

💻 Command Output

intermediate

2:00remaining

Output of Airflow task with schema mismatch

What will be the output when an Airflow PythonOperator task runs a function that raises a ValueError due to schema mismatch?

Apache Airflow

def check_schema(data):
    if 'id' not in data:
        raise ValueError('Missing id field')

check_schema({'name': 'Alice'})

ATask retries automatically without error

BTask succeeds and logs 'Schema valid'

CTask skips execution and marks as success

DTask fails with ValueError: Missing id field

Attempts:

2 left

🔀 Workflow

advanced

2:30remaining

Best workflow to handle schema changes in Airflow

Which Airflow workflow best handles schema changes in a data pipeline to minimize downtime?

AAdd a schema validation task before data processing and alert on failures

BSkip schema validation and rely on downstream tasks to handle errors

CDisable all schema checks to speed up pipeline execution

DRun data processing tasks first, then validate schema after completion

Attempts:

2 left

❓ Troubleshoot

advanced

2:30remaining

Troubleshooting schema mismatch errors in Airflow logs

You see this error in Airflow logs: KeyError: 'user_id'. What is the most likely cause?

AThe Airflow webserver is not running, so logs are incomplete

BThe input data is missing the 'user_id' field expected by the task

CThe DAG file has syntax errors causing task failures

DThe Airflow scheduler is down and cannot assign tasks

Attempts:

2 left

✅ Best Practice

expert

3:00remaining

Handling evolving schemas in Airflow pipelines

What is the best practice to handle evolving data schemas in Airflow pipelines to ensure smooth updates?

AIgnore schema changes and fix errors as they appear during pipeline runs

BHardcode schema fields in tasks and update DAGs only when schema changes

CImplement schema versioning and use conditional branching in DAGs to process different versions

DDisable schema validation to avoid pipeline failures during schema evolution

Attempts:

2 left