Practice

(1/5)

1. What is the main purpose of using dvc repro in a DVC pipeline?

easy

A. To delete all pipeline data and cache

B. To initialize a new DVC repository

C. To reproduce pipeline stages and update outputs if inputs changed

D. To manually edit pipeline stage commands

Solution

Step 1: Understand the role of dvc repro
This command checks if any inputs or dependencies of pipeline stages have changed.
Step 2: Effect of running dvc repro
If changes are detected, it reruns the affected stages to update outputs accordingly.
Final Answer:
To reproduce pipeline stages and update outputs if inputs changed -> Option C
Quick Check:
dvc repro updates pipeline outputs [OK]

Hint: Remember: repro means rerun changed pipeline parts [OK]

Common Mistakes:

Confusing repro with initialization commands
Thinking repro deletes data
Assuming repro edits pipeline commands

2. Which of the following is the correct syntax to add a pipeline stage with DVC that runs python train.py and outputs model.pkl?

easy

A. dvc stage add -n train -o model.pkl python train.py

B. dvc add stage train -o model.pkl python train.py

C. dvc run -n train -o model.pkl python train.py

D. dvc stage add -n train -d train.py -o model.pkl python train.py

Solution

Step 1: Identify required flags for stage creation
The dvc stage add command requires -n for name, -d for dependencies, and -o for outputs.
Step 2: Check which option includes all required flags correctly
dvc stage add -n train -d train.py -o model.pkl python train.py uses -n train, -d train.py (dependency), and -o model.pkl with the command python train.py.
Final Answer:
dvc stage add -n train -d train.py -o model.pkl python train.py -> Option D
Quick Check:
Stage add needs name, dependency, output flags [OK]

Hint: Stage add needs -n (name), -d (deps), -o (outputs) [OK]

Common Mistakes:

Omitting the dependency with -d
Using deprecated dvc run instead of stage add
Mixing order of flags incorrectly

3. Given this DVC pipeline stage definition in dvc.yaml:

stages:
  preprocess:
    cmd: python preprocess.py data/raw data/processed
    deps:
      - data/raw
      - preprocess.py
    outs:
      - data/processed

What happens when you run dvc repro after modifying data/raw?

medium

A. The preprocess stage reruns and updates data/processed

B. Nothing happens because only preprocess.py changes trigger rerun

C. The pipeline fails due to missing output specification

D. All pipeline stages rerun regardless of changes

Solution

Step 1: Identify dependencies of the preprocess stage
The stage depends on data/raw and preprocess.py.
Step 2: Effect of changing data/raw on dvc repro
Changing a dependency triggers rerun of that stage to update outputs.
Final Answer:
The preprocess stage reruns and updates data/processed -> Option A
Quick Check:
Changed input triggers stage rerun [OK]

Hint: Change in deps triggers rerun of that stage [OK]

Common Mistakes:

Assuming no rerun if only data changes
Thinking all stages rerun always
Confusing outputs with dependencies

4. You run dvc repro but get an error: ERROR: failed to reproduce stage 'train': missing dependency 'data/train.csv'. What is the most likely cause?

medium

A. The file data/train.csv was deleted or moved after pipeline creation

B. The dvc.yaml file is missing the train stage

C. The dvc.lock file is corrupted

D. You forgot to run dvc init before dvc repro

Solution

Step 1: Understand the error message
The error says a dependency file is missing, which means DVC cannot find data/train.csv.
Step 2: Common causes of missing dependency errors
Usually, the file was deleted, renamed, or moved after the pipeline stage was created.
Final Answer:
The file data/train.csv was deleted or moved after pipeline creation -> Option A
Quick Check:
Missing dependency file causes repro error [OK]

Hint: Check if all dependency files exist before repro [OK]

Common Mistakes:

Assuming dvc.yaml missing stage causes this error
Blaming dvc.lock corruption without evidence
Forgetting to initialize repo before repro

5. You want to create a DVC pipeline with two stages: extract that outputs data/raw.csv, and train that depends on data/raw.csv and outputs model.pkl. Which sequence of commands correctly sets up this pipeline?

hard

A. dvc stage add -n train -o model.pkl python train.py dvc stage add -n extract -d data/raw.csv -o data/raw.csv python extract.py

B. dvc stage add -n extract -o data/raw.csv python extract.py dvc stage add -n train -d data/raw.csv -o model.pkl python train.py

C. dvc run -n extract -o data/raw.csv python extract.py dvc run -n train -d data/raw.csv -o model.pkl python train.py

D. dvc add data/raw.csv dvc add model.pkl

Solution

Step 1: Define extract stage with output only
Extract stage produces data/raw.csv so it needs -n extract and -o data/raw.csv with the command.
Step 2: Define train stage depending on extract output
Train stage depends on data/raw.csv so it needs -d data/raw.csv, outputs model.pkl, and runs python train.py.
Step 3: Confirm correct order and commands
dvc stage add -n extract -o data/raw.csv python extract.py dvc stage add -n train -d data/raw.csv -o model.pkl python train.py correctly adds extract first, then train with proper dependencies and outputs.
Final Answer:
dvc stage add -n extract -o data/raw.csv python extract.py dvc stage add -n train -d data/raw.csv -o model.pkl python train.py -> Option B
Quick Check:
Define stages with correct deps and outputs [OK]

Hint: Add extract stage first, then train with dependency on extract output [OK]

Common Mistakes:

Adding train stage before extract output exists
Using dvc add instead of stage add for pipeline steps
Missing dependencies in train stage

Input Size (n)	Approx. Operations
3 stages	3 runs
10 stages	10 runs
100 stages	100 runs

Data pipelines with DVC in MLOps - Time & Space Complexity

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of `dvc repro`

Step 2: Effect of running `dvc repro`

Final Answer:

Quick Check:

Solution

Step 1: Identify required flags for stage creation

Step 2: Check which option includes all required flags correctly

Final Answer:

Quick Check:

Solution

Step 1: Identify dependencies of the preprocess stage

Step 2: Effect of changing `data/raw` on `dvc repro`

Final Answer:

Quick Check:

Solution

Step 1: Understand the error message

Step 2: Common causes of missing dependency errors

Final Answer:

Quick Check:

Solution

Step 1: Define extract stage with output only

Step 2: Define train stage depending on extract output

Step 3: Confirm correct order and commands

Final Answer:

Quick Check:

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of dvc repro

Step 2: Effect of running dvc repro

Final Answer:

Quick Check:

Solution

Step 1: Identify required flags for stage creation

Step 2: Check which option includes all required flags correctly

Final Answer:

Quick Check:

Solution

Step 1: Identify dependencies of the preprocess stage

Step 2: Effect of changing data/raw on dvc repro

Final Answer:

Quick Check:

Solution

Step 1: Understand the error message

Step 2: Common causes of missing dependency errors

Final Answer:

Quick Check:

Solution

Step 1: Define extract stage with output only

Step 2: Define train stage depending on extract output

Step 3: Confirm correct order and commands

Final Answer:

Quick Check:

Step 1: Understand the role of `dvc repro`

Step 2: Effect of running `dvc repro`

Step 2: Effect of changing `data/raw` on `dvc repro`