Practice

(1/5)

1. What does a Directed Acyclic Graph (DAG) represent in an MLOps pipeline?

easy

A. Tasks and their dependencies without any cycles

B. A loop of tasks that repeat indefinitely

C. Random tasks executed in parallel without order

D. Only the final output of a pipeline

Solution

Step 1: Understand DAG structure
A DAG is a graph with nodes and edges where edges show dependencies and no cycles exist.
Step 2: Relate DAG to pipeline tasks
In MLOps, tasks are nodes and dependencies are edges, ensuring tasks run in order without loops.
Final Answer:
Tasks and their dependencies without any cycles -> Option A
Quick Check:
DAG = tasks + dependencies without loops [OK]

Hint: DAG means no loops, just tasks linked in order [OK]

Common Mistakes:

Thinking DAG allows loops
Confusing DAG with random task order
Assuming DAG only shows final output

2. Which of the following is the correct syntax to define a simple DAG in Apache Airflow?

easy

A. dag = DAG('my_dag', interval='daily')

B. dag = DAG('my_dag' schedule='daily')

C. dag = DAG('my_dag', schedule='everyday')

D. dag = DAG('my_dag', schedule_interval='@daily')

Solution

Step 1: Check Airflow DAG syntax
The DAG constructor requires a name and a schedule_interval parameter for timing.
Step 2: Validate options
dag = DAG('my_dag', schedule_interval='@daily') uses correct parameter 'schedule_interval' with valid value '@daily'. Others use wrong parameter names or values.
Final Answer:
dag = DAG('my_dag', schedule_interval='@daily') -> Option D
Quick Check:
Correct DAG syntax uses schedule_interval [OK]

Hint: Use schedule_interval='@daily' for daily DAGs [OK]

Common Mistakes:

Using 'schedule' instead of 'schedule_interval'
Wrong interval value formats
Missing commas between parameters

3. Given this Airflow DAG snippet, what is the order of task execution?

task1 = DummyOperator(task_id='task1', dag=dag)
task2 = DummyOperator(task_id='task2', dag=dag)
task3 = DummyOperator(task_id='task3', dag=dag)
task1 >> task2 >> task3

medium

A. task3, then task2, then task1

B. task1, then task2, then task3

C. task2, then task1, then task3

D. All tasks run in parallel

Solution

Step 1: Analyze task dependencies
The '>>' operator sets order: task1 before task2, task2 before task3.
Step 2: Determine execution sequence
Tasks run in sequence: task1 first, then task2, then task3.
Final Answer:
task1, then task2, then task3 -> Option B
Quick Check:
task1 >> task2 >> task3 means sequential order [OK]

Hint: >> means run left task before right task [OK]

Common Mistakes:

Assuming tasks run in reverse order
Thinking tasks run in parallel
Ignoring the '>>' operator meaning

4. You wrote this DAG code but get an error: TypeError: 'DAG' object is not iterable. What is the likely cause?

with DAG('example_dag', schedule_interval='@daily') as dag:
    task1 = DummyOperator(task_id='task1')
    task2 = DummyOperator(task_id='task2')
    task1 >> task2

for task in dag:
    print(task.task_id)

medium

A. DAG object is not iterable, so 'for task in dag' causes error

B. DummyOperator requires a 'dag' parameter outside the context

C. Missing import for DummyOperator

D. schedule_interval '@daily' is invalid

Solution

Step 1: Identify error cause
The error says 'DAG' object is not iterable, likely from trying to loop over dag object.
Step 2: Understand DAG iterability
DAG objects in Airflow are not iterable directly; looping over them causes this error.
Final Answer:
DAG object is not iterable, so 'for task in dag' causes error -> Option A
Quick Check:
DAG is not iterable; use dag.tasks list instead [OK]

Hint: DAG is not iterable; use dag.tasks to loop [OK]

Common Mistakes:

Trying to loop directly over DAG object
Assuming DummyOperator needs dag param outside context
Misreading error as import issue

5. You want to create a pipeline where task A runs first, then tasks B and C run in parallel, and finally task D runs after both B and C finish. Which DAG structure correctly represents this?

hard

A. [A, B] >> C >> D

B. A >> B >> C >> D

C. A >> [B, C] >> D

D. A >> D >> [B, C]

Solution

Step 1: Understand task order requirements
Task A runs first, then B and C run at the same time, then D runs after both finish.
Step 2: Translate to DAG syntax
Using Airflow syntax, 'A >> [B, C] >> D' means A before B and C in parallel, then D after both.
Final Answer:
A >> [B, C] >> D -> Option C
Quick Check:
Parallel tasks in list brackets between sequential tasks [OK]

Hint: Use brackets [] for parallel tasks in DAG [OK]

Common Mistakes:

Placing tasks in wrong order
Not using brackets for parallel tasks
Assuming linear order for all tasks

Input Size (n tasks)	Approx. Operations
10	About 30 checks if each task has 3 dependencies
100	About 300 checks
1000	About 3000 checks

Pipeline components and DAGs in MLOps - Time & Space Complexity

Start learning this pattern below

Practice

Solution

Step 1: Understand DAG structure

Step 2: Relate DAG to pipeline tasks

Final Answer:

Quick Check:

Solution

Step 1: Check Airflow DAG syntax

Step 2: Validate options

Final Answer:

Quick Check:

Solution

Step 1: Analyze task dependencies

Step 2: Determine execution sequence

Final Answer:

Quick Check:

Solution

Step 1: Identify error cause

Step 2: Understand DAG iterability

Final Answer:

Quick Check:

Solution

Step 1: Understand task order requirements

Step 2: Translate to DAG syntax

Final Answer:

Quick Check: