MLOpsdevops~5 mins

Pipeline components and DAGs in MLOps - Commands & Configuration

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

When you build machine learning workflows, you need to organize tasks so they run in order. Pipeline components are the building blocks of these workflows, and DAGs (Directed Acyclic Graphs) show how these tasks connect and flow without loops.

When you want to automate data preprocessing, model training, and evaluation steps in one flow

When you need to run tasks in a specific order and handle dependencies between them

When you want to reuse parts of your workflow as separate components for different projects

When you want to visualize the flow of your machine learning pipeline clearly

When you want to schedule and monitor your ML workflows reliably

Commands

This Python script defines three pipeline components: preprocess, train, and evaluate. It then creates a pipeline that runs these components in order using a DAG structure. Finally, it compiles the pipeline to a YAML file for deployment.

Terminal

from kfp import dsl

@dsl.component
def preprocess_op():
    print('Preprocessing data')

@dsl.component
def train_op():
    print('Training model')

@dsl.component
def evaluate_op():
    print('Evaluating model')

@dsl.pipeline(name='ml-pipeline')
def ml_pipeline():
    preprocess = preprocess_op()
    train = train_op()
    train.after(preprocess)
    evaluate = evaluate_op()
    evaluate.after(train)

if __name__ == '__main__':
    import kfp
    kfp.compiler.Compiler().compile(ml_pipeline, 'ml_pipeline.yaml')

Expected OutputExpected

No output (command runs silently)

This command submits the compiled pipeline YAML to Kubeflow Pipelines to start a run named 'run1' under the experiment 'ml-experiment'. It triggers the execution of the DAG.

Terminal

kfp run submit --pipeline ml_pipeline.yaml --experiment ml-experiment --run-name run1

Expected OutputExpected

Run submitted successfully. Run ID: 12345 Pipeline: ml-pipeline Experiment: ml-experiment Run name: run1

→

--pipeline - Specifies the pipeline YAML file to run

→

--experiment - Names the experiment to group runs

→

--run-name - Gives a name to this specific run

This command lists all runs under the 'ml-experiment' experiment so you can check the status of your pipeline executions.

Terminal

kfp run list --experiment ml-experiment

Expected OutputExpected

Run ID Run Name Status Created At 12345 run1 Succeeded 2024-06-01 10:00:00

→

--experiment - Filters runs by experiment name

Key Concept

If you remember nothing else from this pattern, remember: pipeline components are tasks and DAGs define their order without loops.

Code Example

MLOps

from kfp import dsl

@dsl.component
def preprocess_op():
    print('Preprocessing data')

@dsl.component
def train_op():
    print('Training model')

@dsl.component
def evaluate_op():
    print('Evaluating model')

@dsl.pipeline(name='ml-pipeline')
def ml_pipeline():
    preprocess = preprocess_op()
    train = train_op()
    train.after(preprocess)
    evaluate = evaluate_op()
    evaluate.after(train)

if __name__ == '__main__':
    import kfp
    kfp.compiler.Compiler().compile(ml_pipeline, 'ml_pipeline.yaml')
    print('Pipeline compiled to ml_pipeline.yaml')

OutputSuccess

Common Mistakes

Not specifying the order of components in the pipeline

The tasks may run in any order or in parallel, causing errors or wrong results

Use the .after() method or set dependencies explicitly to define the execution order

Creating cycles in the DAG by making tasks depend on each other circularly

DAGs must not have loops; cycles cause the pipeline to fail or hang

Ensure dependencies form a directed acyclic graph with no loops

Summary

Define pipeline components as small tasks using @dsl.component decorator.

Create a pipeline function that connects components in order using .after() to form a DAG.

Compile the pipeline to a YAML file and submit it to run on Kubeflow Pipelines.

List runs to monitor pipeline execution status.

Practice

(1/5)

1. What does a Directed Acyclic Graph (DAG) represent in an MLOps pipeline?

easy

A. Tasks and their dependencies without any cycles

B. A loop of tasks that repeat indefinitely

C. Random tasks executed in parallel without order

D. Only the final output of a pipeline

Pipeline components and DAGs in MLOps - Commands & Configuration

Start learning this pattern below

Practice

Solution

Step 1: Understand DAG structure

Step 2: Relate DAG to pipeline tasks

Final Answer:

Quick Check:

Solution

Step 1: Check Airflow DAG syntax

Step 2: Validate options

Final Answer:

Quick Check:

Solution

Step 1: Analyze task dependencies

Step 2: Determine execution sequence

Final Answer:

Quick Check:

Solution

Step 1: Identify error cause

Step 2: Understand DAG iterability

Final Answer:

Quick Check:

Solution

Step 1: Understand task order requirements

Step 2: Translate to DAG syntax

Final Answer:

Quick Check: