Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a pipeline in MLOps?
A pipeline is a series of connected steps that process data and train models automatically, like an assembly line in a factory.
Click to reveal answer
beginner
What does DAG stand for and why is it important in pipelines?
DAG stands for Directed Acyclic Graph. It shows the order of steps in a pipeline without loops, ensuring tasks run in the right sequence.
Click to reveal answer
beginner
Name three common components of an MLOps pipeline.
Data ingestion, data processing, and model training are three common pipeline components.
Click to reveal answer
intermediate
How does a DAG help prevent errors in pipeline execution?
By defining a clear order without cycles, a DAG prevents tasks from running before their dependencies, avoiding confusion and errors.
Click to reveal answer
intermediate
What happens if a pipeline step fails in a DAG-based system?
The pipeline stops or retries the failed step, preventing later steps from running with bad data or incomplete results.
Click to reveal answer
What does a pipeline component NOT typically include?
AUser interface design
BData cleaning
CModel deployment
DModel training
✗ Incorrect
User interface design is not part of pipeline components; pipelines focus on data and model steps.
Why must a DAG be acyclic?
ATo allow tasks to run in parallel
BTo speed up the pipeline
CTo avoid infinite loops in task execution
DTo reduce storage needs
✗ Incorrect
A DAG must have no cycles to prevent infinite loops and ensure tasks finish.
Which component typically comes first in an MLOps pipeline?
AModel evaluation
BFeature engineering
CModel deployment
DData ingestion
✗ Incorrect
Data ingestion is the first step to bring data into the pipeline.
What is the main role of a DAG in pipeline management?
ATo schedule tasks in order
BTo store data
CTo visualize model accuracy
DTo monitor hardware usage
✗ Incorrect
A DAG schedules tasks so they run in the correct order.
If a pipeline step depends on another, what does the DAG ensure?
ABoth steps run simultaneously
BThe dependency runs before the dependent step
CThe dependent step runs first
DThe steps run randomly
✗ Incorrect
The DAG ensures dependencies run before dependent steps.
Explain what a pipeline is and describe the role of DAGs in managing pipeline steps.
Think of a pipeline as a recipe and DAG as the step-by-step instructions.
You got /3 concepts.
List common components of an MLOps pipeline and explain why the order of these components matters.
Consider what happens if you train a model before cleaning data.
You got /4 concepts.
Practice
(1/5)
1. What does a Directed Acyclic Graph (DAG) represent in an MLOps pipeline?
easy
A. Tasks and their dependencies without any cycles
B. A loop of tasks that repeat indefinitely
C. Random tasks executed in parallel without order
D. Only the final output of a pipeline
Solution
Step 1: Understand DAG structure
A DAG is a graph with nodes and edges where edges show dependencies and no cycles exist.
Step 2: Relate DAG to pipeline tasks
In MLOps, tasks are nodes and dependencies are edges, ensuring tasks run in order without loops.
Final Answer:
Tasks and their dependencies without any cycles -> Option A
Quick Check:
DAG = tasks + dependencies without loops [OK]
Hint: DAG means no loops, just tasks linked in order [OK]
Common Mistakes:
Thinking DAG allows loops
Confusing DAG with random task order
Assuming DAG only shows final output
2. Which of the following is the correct syntax to define a simple DAG in Apache Airflow?
easy
A. dag = DAG('my_dag', interval='daily')
B. dag = DAG('my_dag' schedule='daily')
C. dag = DAG('my_dag', schedule='everyday')
D. dag = DAG('my_dag', schedule_interval='@daily')
Solution
Step 1: Check Airflow DAG syntax
The DAG constructor requires a name and a schedule_interval parameter for timing.
Step 2: Validate options
dag = DAG('my_dag', schedule_interval='@daily') uses correct parameter 'schedule_interval' with valid value '@daily'. Others use wrong parameter names or values.
Final Answer:
dag = DAG('my_dag', schedule_interval='@daily') -> Option D
Quick Check:
Correct DAG syntax uses schedule_interval [OK]
Hint: Use schedule_interval='@daily' for daily DAGs [OK]
Common Mistakes:
Using 'schedule' instead of 'schedule_interval'
Wrong interval value formats
Missing commas between parameters
3. Given this Airflow DAG snippet, what is the order of task execution?
The '>>' operator sets order: task1 before task2, task2 before task3.
Step 2: Determine execution sequence
Tasks run in sequence: task1 first, then task2, then task3.
Final Answer:
task1, then task2, then task3 -> Option B
Quick Check:
task1 >> task2 >> task3 means sequential order [OK]
Hint: >> means run left task before right task [OK]
Common Mistakes:
Assuming tasks run in reverse order
Thinking tasks run in parallel
Ignoring the '>>' operator meaning
4. You wrote this DAG code but get an error: TypeError: 'DAG' object is not iterable. What is the likely cause?
with DAG('example_dag', schedule_interval='@daily') as dag:
task1 = DummyOperator(task_id='task1')
task2 = DummyOperator(task_id='task2')
task1 >> task2
for task in dag:
print(task.task_id)
medium
A. DAG object is not iterable, so 'for task in dag' causes error
B. DummyOperator requires a 'dag' parameter outside the context
C. Missing import for DummyOperator
D. schedule_interval '@daily' is invalid
Solution
Step 1: Identify error cause
The error says 'DAG' object is not iterable, likely from trying to loop over dag object.
Step 2: Understand DAG iterability
DAG objects in Airflow are not iterable directly; looping over them causes this error.
Final Answer:
DAG object is not iterable, so 'for task in dag' causes error -> Option A
Quick Check:
DAG is not iterable; use dag.tasks list instead [OK]
Hint: DAG is not iterable; use dag.tasks to loop [OK]
Common Mistakes:
Trying to loop directly over DAG object
Assuming DummyOperator needs dag param outside context
Misreading error as import issue
5. You want to create a pipeline where task A runs first, then tasks B and C run in parallel, and finally task D runs after both B and C finish. Which DAG structure correctly represents this?
hard
A. [A, B] >> C >> D
B. A >> B >> C >> D
C. A >> [B, C] >> D
D. A >> D >> [B, C]
Solution
Step 1: Understand task order requirements
Task A runs first, then B and C run at the same time, then D runs after both finish.
Step 2: Translate to DAG syntax
Using Airflow syntax, 'A >> [B, C] >> D' means A before B and C in parallel, then D after both.
Final Answer:
A >> [B, C] >> D -> Option C
Quick Check:
Parallel tasks in list brackets between sequential tasks [OK]
Hint: Use brackets [] for parallel tasks in DAG [OK]