0
0
MLOpsdevops~10 mins

Pipeline components and DAGs in MLOps - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Pipeline components and DAGs
Define pipeline components
Arrange components as tasks
Create DAG with dependencies
Schedule and run pipeline
Monitor task execution and results
Complete pipeline run
This flow shows how pipeline components become tasks arranged in a DAG, which is then scheduled and executed step-by-step.
Execution Sample
MLOps
task1 = load_data()
task2 = preprocess(task1)
task3 = train_model(task2)
task4 = evaluate(task3)
pipeline = DAG([task1, task2, task3, task4])
pipeline.run()
This code defines four tasks as pipeline components, arranges them in a DAG with dependencies, and runs the pipeline.
Process Table
StepTaskStatus BeforeActionStatus AfterOutput
1load_dataNot startedStart taskCompletedRaw data loaded
2preprocessWaiting for load_dataStart after load_dataCompletedData cleaned
3train_modelWaiting for preprocessStart after preprocessCompletedModel trained
4evaluateWaiting for train_modelStart after train_modelCompletedModel evaluated
5pipelineRunningAll tasks completedCompletedPipeline run successful
💡 All tasks completed in order respecting dependencies, pipeline run ends successfully
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4Final
task1_statusNot startedCompletedCompletedCompletedCompletedCompleted
task2_statusNot startedNot startedCompletedCompletedCompletedCompleted
task3_statusNot startedNot startedNot startedCompletedCompletedCompleted
task4_statusNot startedNot startedNot startedNot startedCompletedCompleted
pipeline_statusRunningRunningRunningRunningRunningCompleted
Key Moments - 3 Insights
Why does 'preprocess' wait before starting?
'preprocess' depends on 'load_data' completing first, as shown in execution_table row 2 where status before is 'Waiting for load_data'. This ensures data is ready before cleaning.
Can tasks run in any order?
No, tasks run in order defined by dependencies in the DAG. For example, 'train_model' waits for 'preprocess' to finish (row 3). This prevents errors from missing inputs.
What happens if a task fails?
If a task fails, downstream tasks depending on it won't start. This is not shown here but is a key safety feature of DAGs to avoid running tasks with missing data.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the status of 'train_model' before step 3?
ANot started
BWaiting for preprocess
CCompleted
DRunning
💡 Hint
Check the 'Status Before' column for 'train_model' at step 3 in execution_table
At which step does the pipeline status change to 'Completed'?
AStep 4
BStep 3
CStep 5
DStep 2
💡 Hint
Look at the 'pipeline' row in execution_table and find when status changes to 'Completed'
If 'load_data' fails, what happens to 'preprocess'?
AIt waits indefinitely
BIt runs anyway
CIt starts immediately
DIt skips and pipeline completes
💡 Hint
Refer to key_moments explanation about task dependencies and waiting
Concept Snapshot
Pipeline components are tasks arranged in a Directed Acyclic Graph (DAG).
Each task depends on outputs of previous tasks.
The DAG ensures tasks run in order respecting dependencies.
Pipeline runs by executing tasks step-by-step.
Failures stop downstream tasks to keep data safe.
Full Transcript
This visual execution shows how pipeline components become tasks arranged in a DAG. The code example defines four tasks: load_data, preprocess, train_model, and evaluate. Each task waits for its dependencies to complete before starting. The execution table traces each step, showing task statuses before and after running. Variables track task states changing from 'Not started' to 'Completed'. Key moments clarify why tasks wait for others and how the DAG controls execution order. The quiz tests understanding of task statuses and pipeline completion. The snapshot summarizes the core idea: pipelines use DAGs to run tasks in order safely.