Challenge - 5 Problems

🎖️

DVC Pipeline Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

💻 Command Output

intermediate

2:00remaining

DVC Pipeline Stage Output

You run the command dvc run -n preprocess -d data/raw.csv -o data/processed.csv python preprocess.py. What will be the output of dvc pipeline show immediately after?

Apreprocess

BNo stages found

Cpreprocess -> train

Dtrain

Attempts:

2 left

❓ Configuration

intermediate

2:00remaining

Correct DVC Stage Definition in dvc.yaml

Which of the following dvc.yaml stage definitions correctly specifies a stage named 'train' that depends on 'data/processed.csv' and runs 'python train.py' producing 'model.pkl'?

stages:
  train:
    cmd: python train.py
    deps:
      - data/processed.csv
    outs:
      - model.pkl

stages:
  train:
    cmd: python train.py
    outs:
      - data/processed.csv
    deps:
      - model.pkl

stages:
  train:
    cmd: python train.py
    deps:
      - model.pkl
    outs:
      - data/processed.csv

stages:
  train:
    cmd: python train.py
    deps:
      - model.pkl
    outs:
      - model.pkl

Attempts:

2 left

❓ Troubleshoot

advanced

2:00remaining

DVC Pipeline Reproduction Issue

You modified 'preprocess.py' but running dvc repro does not rerun the 'preprocess' stage. What is the most likely cause?

AThe output file 'data/processed.csv' was deleted manually.

BThe 'preprocess.py' file is not listed as a dependency in the dvc.yaml stage.

CThe DVC cache is full and needs cleaning.

DThe 'dvc.lock' file is missing.

Attempts:

2 left

🔀 Workflow

advanced

2:00remaining

DVC Pipeline Stage Execution Order

Given a pipeline with stages: 'download' -> 'preprocess' -> 'train', which command will reproduce only the 'train' stage and all its dependencies?

Advc repro

Bdvc repro preprocess

Cdvc repro download

Ddvc repro train

Attempts:

2 left

✅ Best Practice

expert

3:00remaining

Best Practice for Large Data Files in DVC Pipelines

You have very large raw data files that rarely change but are needed for multiple pipeline runs. What is the best practice to manage these files with DVC?

ACommit the large raw data files directly to Git for easy access.

BExclude the large raw data files from DVC and Git, and manually copy them before each run.

CTrack the large raw data files with DVC and store them in remote storage, avoiding committing them to Git.

DCompress the large raw data files and commit the compressed files to Git.

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of using dvc repro in a DVC pipeline?

easy

A. To delete all pipeline data and cache

B. To initialize a new DVC repository

C. To reproduce pipeline stages and update outputs if inputs changed

D. To manually edit pipeline stage commands

Data pipelines with DVC in MLOps - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of `dvc repro`

Step 2: Effect of running `dvc repro`

Final Answer:

Quick Check:

Solution

Step 1: Identify required flags for stage creation

Step 2: Check which option includes all required flags correctly

Final Answer:

Quick Check:

Solution

Step 1: Identify dependencies of the preprocess stage

Step 2: Effect of changing `data/raw` on `dvc repro`

Final Answer:

Quick Check:

Solution

Step 1: Understand the error message

Step 2: Common causes of missing dependency errors

Final Answer:

Quick Check:

Solution

Step 1: Define extract stage with output only

Step 2: Define train stage depending on extract output

Step 3: Confirm correct order and commands

Final Answer:

Quick Check:

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of dvc repro

Step 2: Effect of running dvc repro

Final Answer:

Quick Check:

Solution

Step 1: Identify required flags for stage creation

Step 2: Check which option includes all required flags correctly

Final Answer:

Quick Check:

Solution

Step 1: Identify dependencies of the preprocess stage

Step 2: Effect of changing data/raw on dvc repro

Final Answer:

Quick Check:

Solution

Step 1: Understand the error message

Step 2: Common causes of missing dependency errors

Final Answer:

Quick Check:

Solution

Step 1: Define extract stage with output only

Step 2: Define train stage depending on extract output

Step 3: Confirm correct order and commands

Final Answer:

Quick Check:

Step 1: Understand the role of `dvc repro`

Step 2: Effect of running `dvc repro`

Step 2: Effect of changing `data/raw` on `dvc repro`