In an ML pipeline, a Directed Acyclic Graph (DAG) defines the order of tasks. Which statement best describes a DAG?
Think about what 'acyclic' means in the graph context.
A DAG is a graph with directed edges and no cycles, so tasks run in a defined order without loops.
Given the following command to run an ML pipeline task:
mlflow run . -P entry_point=train_model
What is the expected output if the task runs successfully?
Consider what happens when a valid entry point is executed in MLflow.
If the entry point exists and runs without error, MLflow logs metrics and saves the model, indicating success.
Which YAML snippet correctly defines a DAG with three tasks where preprocess runs before train, and train runs before evaluate?
Check the order of dependencies to ensure no cycles and correct sequence.
Option D correctly sets dependencies so preprocess runs first, then train, then evaluate.
You run a pipeline and the evaluate task fails with the error: FileNotFoundError: model.pkl not found. Which is the most likely cause?
Think about which task creates the model file and what happens if it is missing.
The evaluate task depends on the model file created by train. If train fails or is skipped, the file won't exist.
Given the following DAG dependencies:
tasks:
data_ingest:
depends_on: []
feature_engineering:
depends_on: [data_ingest]
model_training:
depends_on: [feature_engineering]
model_validation:
depends_on: [model_training]
deployment:
depends_on: [model_validation, feature_engineering]What is the correct order of task execution?
Remember that a task can only run after all its dependencies have completed.
Deployment depends on both model_validation and feature_engineering, so it runs last after those complete.