Bird
Raised Fist0
MLOpsdevops~10 mins

Kubeflow Pipelines overview in MLOps - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Process Flow - Kubeflow Pipelines overview
Define pipeline components
Assemble components into pipeline
Compile pipeline to YAML
Upload pipeline to Kubeflow UI
Run pipeline experiment
Pipeline executes steps in order
Monitor step status and logs
View results and outputs
Pipeline completes
Kubeflow Pipelines lets you build, run, and monitor machine learning workflows step-by-step using components assembled into a pipeline.
Execution Sample
MLOps
@dsl.component
def preprocess():
    pass

@dsl.component
def train():
    pass

@dsl.pipeline(name='simple-pipeline')
def pipeline():
    preprocess_task = preprocess()
    train_task = train().after(preprocess_task)
Defines two steps and assembles them into a simple Kubeflow pipeline.
Process Table
StepActionEvaluationResult
1Define preprocess componentFunction createdpreprocess component ready
2Define train componentFunction createdtrain component ready
3Create pipeline functionPipeline function definedpipeline structure ready
4Instantiate preprocess_taskCall preprocess()preprocess step added
5Instantiate train_taskCall train()train step added
6Compile pipelineConvert to YAMLpipeline.yaml generated
7Upload pipeline.yamlUpload to Kubeflow UIPipeline available in UI
8Run pipelineStart experimentPipeline execution started
9Execute preprocess stepRun preprocessPreprocess step completed
10Execute train stepRun trainTrain step completed
11Pipeline completesAll steps donePipeline run successful
💡 All pipeline steps executed successfully, pipeline run ends.
Status Tracker
VariableStartAfter Step 4After Step 5After Step 8After Step 11
preprocess_taskNonepreprocess component instancepreprocess component instancerunningcompleted
train_taskNoneNonetrain component instancependingcompleted
pipeline_statusNot startedNot startedNot startedRunningSucceeded
Key Moments - 3 Insights
Why do we define components before assembling the pipeline?
Components are the building blocks; defining them first (see steps 1 and 2 in execution_table) allows us to reuse and organize tasks before creating the pipeline.
What happens when we compile the pipeline?
Compiling (step 6) converts the Python pipeline code into a YAML file that Kubeflow understands to run the workflow.
How does Kubeflow execute the pipeline steps?
Kubeflow runs steps in order respecting dependencies (steps 9 and 10), monitoring each step's status until completion.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the state of 'train_task' after step 5?
Arunning
BNone
Ctrain component instance
Dcompleted
💡 Hint
Check the 'train_task' variable in variable_tracker after Step 5.
At which step does the pipeline start running in Kubeflow?
AStep 6
BStep 8
CStep 9
DStep 11
💡 Hint
Look for 'Pipeline execution started' in execution_table.
If the 'preprocess' step fails, what happens to the pipeline execution?
APipeline stops and marks failure
BPipeline continues to 'train' step
CPipeline retries 'train' step
DPipeline skips 'preprocess' and completes
💡 Hint
Refer to the execution flow where steps run in order and depend on previous success.
Concept Snapshot
Kubeflow Pipelines let you build ML workflows by:
- Defining reusable components (steps)
- Assembling them into a pipeline function
- Compiling to YAML for Kubeflow
- Uploading and running via Kubeflow UI
- Monitoring step status and outputs
Steps run in order respecting dependencies.
Full Transcript
Kubeflow Pipelines help you create machine learning workflows by defining small tasks called components. You write these components as functions, then combine them into a pipeline function. This pipeline is compiled into a YAML file that Kubeflow understands. You upload this YAML to the Kubeflow UI and start an experiment to run the pipeline. Kubeflow runs each step in order, showing status and logs. When all steps finish, the pipeline run completes successfully.

Practice

(1/5)
1. What is the main purpose of Kubeflow Pipelines in machine learning workflows?
easy
A. To store large datasets for training
B. To create user interfaces for ML models
C. To automate and manage ML workflows with clear, reusable steps
D. To replace Kubernetes as a container platform

Solution

  1. Step 1: Understand Kubeflow Pipelines' role

    Kubeflow Pipelines are designed to automate ML workflows by defining clear steps that can be reused and tracked.
  2. Step 2: Compare options with this role

    Options describing UI creation, data storage, and replacing Kubernetes do not match this role.
  3. Final Answer:

    To automate and manage ML workflows with clear, reusable steps -> Option C
  4. Quick Check:

    Automation of ML workflows [OK]
Hint: Kubeflow Pipelines automate ML steps, not UI or storage [OK]
Common Mistakes:
  • Confusing Kubeflow Pipelines with data storage tools
  • Thinking Kubeflow replaces Kubernetes
  • Assuming it builds user interfaces
2. Which of the following is the correct way to define a step in a Kubeflow Pipeline using Python?
easy
A. def step(): return dsl.ContainerOp(name='step', image='python:3.8')
B. def step(): return dsl.ContainerOp(image='python:3.8')
C. def step(): return dsl.ContainerOp(name='step')
D. def step(): return dsl.ContainerOp(name='step', image='python:3.8', command=['python', 'script.py'])

Solution

  1. Step 1: Understand ContainerOp usage

    ContainerOp requires at least a name, image, and usually a command to run the step's container.
  2. Step 2: Check each option

    def step(): return dsl.ContainerOp(name='step', image='python:3.8', command=['python', 'script.py']) correctly includes name, image, and command. The version with name and image misses command. The version with only image misses name. The version with only name misses image and command.
  3. Final Answer:

    def step():\n return dsl.ContainerOp(name='step', image='python:3.8', command=['python', 'script.py']) -> Option D
  4. Quick Check:

    ContainerOp needs name, image, and command [OK]
Hint: ContainerOp needs name, image, and command to run [OK]
Common Mistakes:
  • Omitting the command argument
  • Not specifying the container image
  • Missing the step name
3. Given this Kubeflow Pipeline snippet, what will be the output when the pipeline runs?
from kfp import dsl

@dsl.pipeline(name='Sample Pipeline')
def sample_pipeline():
    step1 = dsl.ContainerOp(
        name='echo-step',
        image='alpine',
        command=['echo', 'Hello Kubeflow']
    )
medium
A. The pipeline prints 'Hello Kubeflow' in the step logs
B. The pipeline fails because 'echo' is not a valid command
C. The pipeline prints 'Hello Kubeflow' on the console where pipeline is defined
D. The pipeline does nothing because no output is defined

Solution

  1. Step 1: Understand ContainerOp execution

    The ContainerOp runs a container with the alpine image and executes the command 'echo Hello Kubeflow'. This prints to the container's standard output logs.
  2. Step 2: Identify where output appears

    The output appears in the step logs inside Kubeflow Pipelines UI, not on the local console or nowhere.
  3. Final Answer:

    The pipeline prints 'Hello Kubeflow' in the step logs -> Option A
  4. Quick Check:

    ContainerOp command output = step logs [OK]
Hint: Container output appears in step logs, not local console [OK]
Common Mistakes:
  • Thinking output appears on local console
  • Assuming 'echo' command is invalid in alpine
  • Believing pipeline does nothing without explicit output
4. You wrote this Kubeflow Pipeline step but it fails to run:
def step():
    return dsl.ContainerOp(name='step', image='python:3.8')
What is the most likely cause of the failure?
medium
A. Missing the command argument to specify what to run inside the container
B. The image 'python:3.8' does not exist
C. The step name cannot be 'step'
D. ContainerOp requires a volume to run

Solution

  1. Step 1: Check ContainerOp requirements

    ContainerOp needs a command to run inside the container; without it, the container starts and exits immediately.
  2. Step 2: Validate other options

    Image 'python:3.8' exists on Docker Hub, step name can be any string, and volume is optional.
  3. Final Answer:

    Missing the command argument to specify what to run inside the container -> Option A
  4. Quick Check:

    ContainerOp needs command to run [OK]
Hint: Always specify command in ContainerOp to avoid immediate exit [OK]
Common Mistakes:
  • Assuming image is missing or invalid
  • Thinking step name is restricted
  • Believing volume is mandatory
5. You want to create a Kubeflow Pipeline that runs two steps sequentially: first preprocess data, then train a model. Which code snippet correctly defines this dependency?
hard
A. step1 = dsl.ContainerOp(name='preprocess', image='python:3.8', command=['python', 'preprocess.py']) step2 = dsl.ContainerOp(name='train', image='python:3.8', command=['python', 'train.py']) step1.after(step2)
B. step1 = dsl.ContainerOp(name='preprocess', image='python:3.8', command=['python', 'preprocess.py']) step2 = dsl.ContainerOp(name='train', image='python:3.8', command=['python', 'train.py']) step2.after(step1)
C. step1 = dsl.ContainerOp(name='preprocess', image='python:3.8', command=['python', 'preprocess.py']) step2 = dsl.ContainerOp(name='train', image='python:3.8', command=['python', 'train.py']) step1.before(step2)
D. step1 = dsl.ContainerOp(name='preprocess', image='python:3.8', command=['python', 'preprocess.py']) step2 = dsl.ContainerOp(name='train', image='python:3.8', command=['python', 'train.py'])

Solution

  1. Step 1: Understand step dependencies in Kubeflow Pipelines

    To run step2 after step1, use step2.after(step1) to set the order.
  2. Step 2: Analyze each option

    step1 = dsl.ContainerOp(name='preprocess', image='python:3.8', command=['python', 'preprocess.py']) step2 = dsl.ContainerOp(name='train', image='python:3.8', command=['python', 'train.py']) step2.after(step1) correctly sets step2 to run after step1. Using step1.after(step2) reverses the order. Using step1.before(step2) calls a non-existent method. No dependency causes parallel execution.
  3. Final Answer:

    step2.after(step1) -> Option B
  4. Quick Check:

    Use step2.after(step1) for sequential steps [OK]
Hint: Use step2.after(step1) to run steps sequentially [OK]
Common Mistakes:
  • Reversing the order with after()
  • Using before() which does not exist
  • Not setting any dependency causing parallel runs