B. Only the data.csv.dvc pointer file is pushed to Git remote
C. The Git repository is cloned to remote storage
D. The actual data file data.csv is uploaded to remote storage
Solution
Step 1: Understand dvc push behavior
dvc push uploads the actual large data files tracked by DVC to the configured remote storage, not just Git files.
Step 2: Differentiate Git and DVC storage roles
Git stores small pointer files like data.csv.dvc, while DVC manages big data files separately in remote storage.
Final Answer:
The actual data file data.csv is uploaded to remote storage -> Option D
Quick Check:
dvc push uploads data files remotely [OK]
Hint: dvc push uploads big data files, not just pointers [OK]
Common Mistakes:
Thinking dvc push only pushes Git files
Confusing dvc push with git push
Assuming data files are deleted after push
4. You ran dvc add dataset.csv but forgot to commit the generated dataset.csv.dvc file to Git. What problem will you face?
medium
A. DVC will not track the data file until the pointer file is committed
B. The data file will be deleted automatically
C. Git will track the data file instead of DVC
D. No problem; DVC tracks data without Git commits
Solution
Step 1: Understand the role of the .dvc pointer file
The dataset.csv.dvc file is a small pointer tracked by Git that tells DVC about the data file version.
Step 2: Consequence of not committing the pointer file
If you don't commit this pointer file, Git and collaborators won't know about the data version, so DVC tracking is incomplete.
Final Answer:
DVC will not track the data file until the pointer file is committed -> Option A
Quick Check:
Pointer file commit = DVC tracking active [OK]
Hint: Always commit .dvc pointer files after dvc add [OK]
Common Mistakes:
Assuming data files are tracked without pointer commits
Thinking data files get deleted automatically
Believing Git tracks large data files directly
5. You have a large dataset tracked by DVC and a remote storage configured. Your teammate cloned the Git repo but the data files are missing locally. Which command should they run to get the data files?
hard
A. dvc pull
B. dvc add
C. git pull
D. git clone
Solution
Step 1: Understand what dvc pull does
dvc pull downloads the actual data files from remote storage to the local machine based on the pointer files in Git.
Step 2: Differentiate from Git commands
git pull updates code and pointer files but does not fetch large data files. dvc add tracks new data, and git clone clones the repo initially.
Final Answer:
dvc pull -> Option A
Quick Check:
Use dvc pull to fetch data files locally [OK]
Hint: Use dvc pull to download data after cloning repo [OK]