0
0
MLOpsdevops~10 mins

Apache Airflow for ML orchestration in MLOps - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Apache Airflow for ML orchestration
Define DAG and tasks
Schedule DAG run
Trigger task 1: Data ingestion
Trigger task 2: Data preprocessing
Trigger task 3: Model training
Trigger task 4: Model evaluation
Trigger task 5: Model deployment
DAG run complete
Airflow runs a Directed Acyclic Graph (DAG) of tasks in order, scheduling and executing each step of the ML workflow automatically.
Execution Sample
MLOps
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def ingest():
    print('Data ingested')

def preprocess():
    print('Data preprocessed')

with DAG('ml_pipeline', start_date=datetime(2024,1,1), schedule_interval='@daily') as dag:
    t1 = PythonOperator(task_id='ingest', python_callable=ingest)
    t2 = PythonOperator(task_id='preprocess', python_callable=preprocess)
    t1 >> t2
Defines a simple Airflow DAG with two tasks: ingest data and preprocess data, where preprocess runs after ingest.
Process Table
StepTask TriggeredTask StatusOutputNext Task
1ingestrunningData ingestedpreprocess
2ingestsuccessData ingestedpreprocess
3preprocessrunningData preprocessednone
4preprocesssuccessData preprocessedDAG complete
💡 All tasks completed successfully, DAG run finished.
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4Final
Task Status (ingest)nonerunningsuccesssuccesssuccesssuccess
Task Status (preprocess)nonenonenonerunningsuccesssuccess
DAG Run Statusnonerunningrunningrunningrunningsuccess
Key Moments - 2 Insights
Why does the 'preprocess' task wait before running?
Because in the execution_table, 'preprocess' starts only after 'ingest' shows 'success' at Step 2, showing task dependencies.
What happens if a task fails during execution?
The DAG run stops or retries depending on configuration; here, all tasks succeed as shown by 'success' status in the table.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the status of 'ingest' at Step 2?
Arunning
Bfailed
Csuccess
Dqueued
💡 Hint
Check the 'Task Status' column for 'ingest' at Step 2 in the execution_table.
At which step does the 'preprocess' task start running?
AStep 3
BStep 2
CStep 1
DStep 4
💡 Hint
Look for when 'preprocess' has status 'running' in the execution_table.
If the 'ingest' task failed at Step 2, what would happen to the DAG run?
AThe DAG would continue to 'preprocess' anyway
BThe DAG run would stop or retry 'ingest'
CThe DAG would skip 'preprocess' and finish
DThe DAG would restart from the beginning
💡 Hint
Refer to the key_moments explanation about task failure impact on DAG execution.
Concept Snapshot
Apache Airflow runs ML workflows as DAGs.
Define tasks and their order with operators.
Airflow schedules and runs tasks automatically.
Tasks run only after dependencies succeed.
Failures can stop or retry tasks.
Use PythonOperators for Python code tasks.
Full Transcript
Apache Airflow helps automate ML workflows by defining tasks in a DAG. Each task runs in order, waiting for dependencies to finish. For example, data ingestion runs first, then preprocessing. The execution table shows task status changes from running to success. If a task fails, the DAG run may stop or retry. This visual trace helps understand how Airflow manages ML pipelines step-by-step.