0
0
MLOpsdevops~20 mins

Data pipelines with DVC in MLOps - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
DVC Pipeline Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
💻 Command Output
intermediate
2:00remaining
DVC Pipeline Stage Output
You run the command dvc run -n preprocess -d data/raw.csv -o data/processed.csv python preprocess.py. What will be the output of dvc pipeline show immediately after?
Apreprocess
BNo stages found
Cpreprocess -> train
Dtrain
Attempts:
2 left
💡 Hint
Think about what stages are created after running dvc run.
Configuration
intermediate
2:00remaining
Correct DVC Stage Definition in dvc.yaml
Which of the following dvc.yaml stage definitions correctly specifies a stage named 'train' that depends on 'data/processed.csv' and runs 'python train.py' producing 'model.pkl'?
A
stages:
  train:
    cmd: python train.py
    deps:
      - data/processed.csv
    outs:
      - model.pkl
B
stages:
  train:
    cmd: python train.py
    outs:
      - data/processed.csv
    deps:
      - model.pkl
C
stages:
  train:
    cmd: python train.py
    deps:
      - model.pkl
    outs:
      - data/processed.csv
D
stages:
  train:
    cmd: python train.py
    deps:
      - model.pkl
    outs:
      - model.pkl
Attempts:
2 left
💡 Hint
Remember dependencies are inputs, outputs are results.
Troubleshoot
advanced
2:00remaining
DVC Pipeline Reproduction Issue
You modified 'preprocess.py' but running dvc repro does not rerun the 'preprocess' stage. What is the most likely cause?
AThe output file 'data/processed.csv' was deleted manually.
BThe 'preprocess.py' file is not listed as a dependency in the dvc.yaml stage.
CThe DVC cache is full and needs cleaning.
DThe 'dvc.lock' file is missing.
Attempts:
2 left
💡 Hint
DVC tracks changes only in declared dependencies.
🔀 Workflow
advanced
2:00remaining
DVC Pipeline Stage Execution Order
Given a pipeline with stages: 'download' -> 'preprocess' -> 'train', which command will reproduce only the 'train' stage and all its dependencies?
Advc repro
Bdvc repro preprocess
Cdvc repro download
Ddvc repro train
Attempts:
2 left
💡 Hint
Reproducing a stage also reproduces its dependencies.
Best Practice
expert
3:00remaining
Best Practice for Large Data Files in DVC Pipelines
You have very large raw data files that rarely change but are needed for multiple pipeline runs. What is the best practice to manage these files with DVC?
ACommit the large raw data files directly to Git for easy access.
BExclude the large raw data files from DVC and Git, and manually copy them before each run.
CTrack the large raw data files with DVC and store them in remote storage, avoiding committing them to Git.
DCompress the large raw data files and commit the compressed files to Git.
Attempts:
2 left
💡 Hint
Think about versioning large files efficiently without bloating Git.