Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Building a Simple MLOps Pipeline with Components and DAGs
📖 Scenario: You are working as a data engineer in a team that builds machine learning pipelines. Your task is to create a simple pipeline that has components for data loading, data preprocessing, and model training. These components will be connected in a Directed Acyclic Graph (DAG) to define the order of execution.This project will help you understand how pipeline components and DAGs work in MLOps.
🎯 Goal: Build a simple MLOps pipeline using Python dictionaries to represent components and a list to represent the DAG order. You will create components for data loading, preprocessing, and training, then connect them in a DAG, and finally print the execution order.
📋 What You'll Learn
Create a dictionary called components with keys 'load_data', 'preprocess_data', and 'train_model' each having a string description as value.
Create a list called dag that defines the execution order of the components as 'load_data', 'preprocess_data', 'train_model'.
Use a for loop to iterate over the dag list and print the component name and its description from the components dictionary.
💡 Why This Matters
🌍 Real World
In real MLOps, pipelines are built with components representing tasks like data loading, preprocessing, and training. These tasks are connected in a DAG to control the order of execution.
💼 Career
Understanding pipeline components and DAGs is essential for roles like MLOps engineer, data engineer, and machine learning engineer to automate and manage ML workflows efficiently.
Progress0 / 4 steps
1
Create pipeline components dictionary
Create a dictionary called components with these exact entries: 'load_data': 'Load raw data from source', 'preprocess_data': 'Clean and transform data', and 'train_model': 'Train ML model on processed data'.
MLOps
Hint
Use curly braces {} to create a dictionary. Each key is a string like 'load_data' and each value is a string description.
2
Define the DAG execution order
Create a list called dag with the exact order of component names: 'load_data', 'preprocess_data', 'train_model'.
MLOps
Hint
Use square brackets [] to create a list. Put the component names as strings in the correct order.
3
Iterate over DAG and print component info
Use a for loop with variable component to iterate over the dag list. Inside the loop, get the description from components[component] and print the component name and description in this format: "Component: {component}, Description: {description}".
MLOps
Hint
Use a for loop to go through each item in dag. Use f-strings to format the print output.
4
Print the pipeline execution order
Write a print statement to display the text exactly: "Pipeline execution order completed."
MLOps
Hint
Use print("Pipeline execution order completed.") exactly to show the final message.
Practice
(1/5)
1. What does a Directed Acyclic Graph (DAG) represent in an MLOps pipeline?
easy
A. Tasks and their dependencies without any cycles
B. A loop of tasks that repeat indefinitely
C. Random tasks executed in parallel without order
D. Only the final output of a pipeline
Solution
Step 1: Understand DAG structure
A DAG is a graph with nodes and edges where edges show dependencies and no cycles exist.
Step 2: Relate DAG to pipeline tasks
In MLOps, tasks are nodes and dependencies are edges, ensuring tasks run in order without loops.
Final Answer:
Tasks and their dependencies without any cycles -> Option A
Quick Check:
DAG = tasks + dependencies without loops [OK]
Hint: DAG means no loops, just tasks linked in order [OK]
Common Mistakes:
Thinking DAG allows loops
Confusing DAG with random task order
Assuming DAG only shows final output
2. Which of the following is the correct syntax to define a simple DAG in Apache Airflow?
easy
A. dag = DAG('my_dag', interval='daily')
B. dag = DAG('my_dag' schedule='daily')
C. dag = DAG('my_dag', schedule='everyday')
D. dag = DAG('my_dag', schedule_interval='@daily')
Solution
Step 1: Check Airflow DAG syntax
The DAG constructor requires a name and a schedule_interval parameter for timing.
Step 2: Validate options
dag = DAG('my_dag', schedule_interval='@daily') uses correct parameter 'schedule_interval' with valid value '@daily'. Others use wrong parameter names or values.
Final Answer:
dag = DAG('my_dag', schedule_interval='@daily') -> Option D
Quick Check:
Correct DAG syntax uses schedule_interval [OK]
Hint: Use schedule_interval='@daily' for daily DAGs [OK]
Common Mistakes:
Using 'schedule' instead of 'schedule_interval'
Wrong interval value formats
Missing commas between parameters
3. Given this Airflow DAG snippet, what is the order of task execution?
The '>>' operator sets order: task1 before task2, task2 before task3.
Step 2: Determine execution sequence
Tasks run in sequence: task1 first, then task2, then task3.
Final Answer:
task1, then task2, then task3 -> Option B
Quick Check:
task1 >> task2 >> task3 means sequential order [OK]
Hint: >> means run left task before right task [OK]
Common Mistakes:
Assuming tasks run in reverse order
Thinking tasks run in parallel
Ignoring the '>>' operator meaning
4. You wrote this DAG code but get an error: TypeError: 'DAG' object is not iterable. What is the likely cause?
with DAG('example_dag', schedule_interval='@daily') as dag:
task1 = DummyOperator(task_id='task1')
task2 = DummyOperator(task_id='task2')
task1 >> task2
for task in dag:
print(task.task_id)
medium
A. DAG object is not iterable, so 'for task in dag' causes error
B. DummyOperator requires a 'dag' parameter outside the context
C. Missing import for DummyOperator
D. schedule_interval '@daily' is invalid
Solution
Step 1: Identify error cause
The error says 'DAG' object is not iterable, likely from trying to loop over dag object.
Step 2: Understand DAG iterability
DAG objects in Airflow are not iterable directly; looping over them causes this error.
Final Answer:
DAG object is not iterable, so 'for task in dag' causes error -> Option A
Quick Check:
DAG is not iterable; use dag.tasks list instead [OK]
Hint: DAG is not iterable; use dag.tasks to loop [OK]
Common Mistakes:
Trying to loop directly over DAG object
Assuming DummyOperator needs dag param outside context
Misreading error as import issue
5. You want to create a pipeline where task A runs first, then tasks B and C run in parallel, and finally task D runs after both B and C finish. Which DAG structure correctly represents this?
hard
A. [A, B] >> C >> D
B. A >> B >> C >> D
C. A >> [B, C] >> D
D. A >> D >> [B, C]
Solution
Step 1: Understand task order requirements
Task A runs first, then B and C run at the same time, then D runs after both finish.
Step 2: Translate to DAG syntax
Using Airflow syntax, 'A >> [B, C] >> D' means A before B and C in parallel, then D after both.
Final Answer:
A >> [B, C] >> D -> Option C
Quick Check:
Parallel tasks in list brackets between sequential tasks [OK]
Hint: Use brackets [] for parallel tasks in DAG [OK]