Challenge - 5 Problems
DVC Dataset Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
💻 Command Output
intermediate1:30remaining
DVC add command output
You run the command
dvc add data/raw_images in your project folder. What is the expected output?MLOps
dvc add data/raw_images
Attempts:
2 left
💡 Hint
Think about what happens when you add a directory with files to DVC.
✗ Incorrect
The 'dvc add' command tracks the specified directory or file by computing its checksum and creating a .dvc file to track it. It does not error if the directory exists and has files.
🧠 Conceptual
intermediate1:00remaining
Purpose of .dvc files
What is the main purpose of the
.dvc files created when you track datasets with DVC?Attempts:
2 left
💡 Hint
Think about how DVC tracks large files without putting them in Git.
✗ Incorrect
DVC uses .dvc files to store metadata like checksums and paths to data, enabling version control without storing large files in Git.
🔀 Workflow
advanced2:00remaining
Correct sequence to track and push dataset with DVC
Arrange the steps in the correct order to track a new dataset folder
data/images and push it to remote storage using DVC.Attempts:
2 left
💡 Hint
Think about adding data, then staging files for Git, committing, then pushing data to remote.
✗ Incorrect
First, you add the data to DVC, then stage the .dvc file and .gitignore for Git, commit the changes, and finally push the data to remote storage.
❓ Troubleshoot
advanced1:30remaining
DVC push error diagnosis
You run
dvc push but get the error: ERROR: failed to upload data/images: Access denied to remote storage. What is the most likely cause?Attempts:
2 left
💡 Hint
Think about what 'Access denied' means when pushing data.
✗ Incorrect
Access denied errors usually mean the remote storage credentials are missing, expired, or incorrect, preventing upload.
✅ Best Practice
expert2:00remaining
Best practice for dataset versioning with DVC
Which practice ensures reliable dataset versioning and collaboration when using DVC in a team?
Attempts:
2 left
💡 Hint
Think about how to keep data versions consistent and shareable in a team.
✗ Incorrect
The best practice is to track datasets with DVC, commit the small .dvc files to Git, and push large data files to a shared remote storage for collaboration.