0
0
MLOpsdevops~10 mins

Data validation in CI pipeline in MLOps - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Data validation in CI pipeline
Code Commit / Data Update
Trigger CI Pipeline
Run Data Validation Tests
Continue Pipeline
Deploy Model / Data
This flow shows how a data update triggers a CI pipeline that runs data validation tests. If tests pass, the pipeline continues; if they fail, it stops and alerts.
Execution Sample
MLOps
def validate_data(data):
    if data.isnull().sum().sum() > 0:
        return False
    if (data['age'] < 0).any():
        return False
    return True

result = validate_data(new_data)
This code checks if the data has missing values or negative ages and returns True if data is valid, False otherwise.
Process Table
StepCheckConditionResultAction
1Check missing valuesdata.isnull().sum().sum() > 0FalseContinue
2Check negative ages(data['age'] < 0).any()TrueFail validation
3Return validation resultN/AFalseStop pipeline and alert
💡 Validation failed due to negative age values, pipeline stops to prevent bad data deployment.
Status Tracker
VariableStartAfter Step 1After Step 2Final
data.isnull().sum().sum()N/A000
(data['age'] < 0).any()N/AN/ATrueTrue
resultN/AN/AN/AFalse
Key Moments - 2 Insights
Why does the pipeline stop even though there are no missing values?
Because the negative age check failed at step 2 (see execution_table row 2), which is critical for data quality, so the pipeline stops to prevent bad data deployment.
What happens if all checks pass?
If all checks pass (both conditions False), the validation returns True and the pipeline continues to deploy the model or data (not shown in this trace).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the result of the missing values check at step 1?
ATrue
BNone
CFalse
DError
💡 Hint
Refer to execution_table row 1, column 'Result' shows False for missing values check.
At which step does the pipeline decide to stop?
AStep 3
BStep 1
CStep 2
DPipeline never stops
💡 Hint
Look at execution_table row 3 where the validation result is False and action is to stop pipeline.
If the data had no negative ages, how would the 'result' variable change?
AIt would be False
BIt would be True
CIt would be None
DIt would cause an error
💡 Hint
Check variable_tracker for 'result' and consider if all conditions are False, validation returns True.
Concept Snapshot
Data validation in CI pipeline:
- Triggered by code or data update
- Runs checks like missing values and invalid data
- If validation passes, pipeline continues
- If validation fails, pipeline stops and alerts
- Prevents bad data/model deployment
Full Transcript
When new data or code is committed, the CI pipeline starts. It runs data validation tests step-by-step. First, it checks for missing values. If none are found, it checks for invalid data like negative ages. If any check fails, the pipeline stops and alerts the team to fix issues. If all checks pass, the pipeline continues to deploy the model or data safely. This process helps keep data quality high and prevents errors in production.