Ml-pythonHow-ToBeginner · 4 min read

How to Use Kubeflow Pipelines for Machine Learning Workflows

To use Kubeflow pipelines, define your machine learning workflow as Python functions decorated with @dsl.pipeline, compile it into a pipeline package, and then upload and run it on the Kubeflow Pipelines UI or via the SDK. This automates and tracks your ML tasks in a repeatable way.

📐

Syntax

Kubeflow pipelines use Python functions to define steps, decorated with @dsl.pipeline to mark the workflow. Each step is a containerized component. The pipeline is compiled into a YAML package for deployment.

@dsl.pipeline: Decorates the main function defining the workflow.
Components: Functions or container ops representing tasks.
Pipeline compilation: Converts Python code to a deployable YAML file.
Client: Uploads and runs pipelines on Kubeflow server.

python

from kfp import dsl

@dsl.pipeline(
    name='Sample Pipeline',
    description='A simple example pipeline'
)
def sample_pipeline():
    op1 = dsl.ContainerOp(
        name='echo',
        image='alpine:3.6',
        command=['echo', 'Hello Kubeflow!']
    )

if __name__ == '__main__':
    import kfp.compiler as compiler
    compiler.Compiler().compile(sample_pipeline, 'sample_pipeline.yaml')

💻

Example

This example shows how to define a simple Kubeflow pipeline that prints a message, compile it, and run it using the Kubeflow Pipelines SDK.

python

from kfp import dsl, Client

@dsl.pipeline(
    name='Hello Kubeflow',
    description='Prints a greeting message'
)
def hello_pipeline():
    echo_op = dsl.ContainerOp(
        name='echo',
        image='alpine:3.6',
        command=['echo', 'Hello Kubeflow Pipelines!']
    )

if __name__ == '__main__':
    import kfp.compiler as compiler
    compiler.Compiler().compile(hello_pipeline, 'hello_pipeline.yaml')

    client = Client()
    run = client.create_run_from_pipeline_func(hello_pipeline, arguments={})
    print(f'Pipeline run ID: {run.run_id}')

Output

Pipeline run ID: <some-unique-run-id>

⚠️

Common Pitfalls

Common mistakes when using Kubeflow pipelines include:

Not containerizing steps properly, causing runtime errors.
Forgetting to compile the pipeline before uploading.
Using incompatible Python SDK versions with the Kubeflow server.
Not handling pipeline parameters correctly, leading to failures.

Always test components individually and verify the pipeline YAML before deployment.

python

from kfp import dsl

# Wrong: Missing @dsl.pipeline decorator
# def my_pipeline():
#     pass

# Right:
@dsl.pipeline(name='Correct Pipeline')
def my_pipeline():
    pass

📊

Quick Reference

Kubeflow Pipelines Concept	Description
@dsl.pipeline	Decorator to define the pipeline function
dsl.ContainerOp	Defines a pipeline step using a container image
Compiler().compile()	Compiles Python pipeline to YAML file
Client()	Kubeflow Pipelines SDK client to run/manage pipelines
create_run_from_pipeline_func	Runs a pipeline function directly

✅

Key Takeaways

Define your ML workflow as Python functions decorated with @dsl.pipeline.

Compile your pipeline code into a YAML package before running.

Use the Kubeflow Pipelines SDK Client to upload and execute pipelines.

Containerize each step properly to avoid runtime errors.

Test components and pipeline parameters carefully before deployment.