0
0
MLOpsdevops~20 mins

Data validation in CI pipeline in MLOps - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Data Validation Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Purpose of Data Validation in CI Pipeline

Why is data validation important in a Continuous Integration (CI) pipeline for machine learning projects?

ATo speed up the model training by skipping data checks
BTo reduce the size of the dataset by removing random samples
CTo automatically deploy models without human review
DTo ensure that the data used for training meets quality standards before model training starts
Attempts:
2 left
💡 Hint

Think about what could happen if bad data enters the training process.

💻 Command Output
intermediate
2:00remaining
Output of Data Validation Step in CI

What is the expected output when a data validation step in a CI pipeline detects missing values in a critical feature?

MLOps
run_data_validation --input data.csv --check missing_values
ASyntaxError: invalid syntax in command.
BValidation passed: No missing values detected.
CValidation failed: Missing values found in column 'age'.
DError: File not found 'data.csv'.
Attempts:
2 left
💡 Hint

Consider what happens if the validation finds a problem in the data.

Configuration
advanced
2:30remaining
Configuring Data Validation in a CI YAML Pipeline

Which YAML snippet correctly configures a data validation step in a CI pipeline that runs a Python script validate_data.py and fails the pipeline if validation fails?

A
steps:
  - name: Data Validation
    run: python validate_data.py
B
steps:
  - name: Data Validation
    run: python validate_data.py
    fail-on-error: true
C
steps:
  - name: Data Validation
    run: python validate_data.py
    continue-on-error: true
D
steps:
  - name: Data Validation
    run: python validate_data.py
    fail-fast: true
Attempts:
2 left
💡 Hint

By default, CI steps fail if the command exits with an error code.

Troubleshoot
advanced
2:30remaining
Troubleshooting Data Validation Failure in CI

A CI pipeline data validation step fails with the error: KeyError: 'target_column'. What is the most likely cause?

AThe dataset does not contain the required 'target_column' feature.
BThe validation script has a syntax error in the code.
CThe CI runner does not have Python installed.
DThe pipeline configuration YAML is missing the data validation step.
Attempts:
2 left
💡 Hint

KeyError usually means a missing key in a dictionary or dataframe column.

🔀 Workflow
expert
3:00remaining
Order of Steps in a CI Pipeline with Data Validation

What is the correct order of steps in a CI pipeline that includes data validation, model training, and deployment?

A2,1,3
B2,3,1
C1,2,3
D3,2,1
Attempts:
2 left
💡 Hint

Think about what must happen before training and deployment.