How to do CI CD for ML

Ml-pythonHow-ToBeginner · 4 min read

How to Do CI/CD for Machine Learning Projects

To do CI/CD for machine learning, automate your model training, testing, and deployment using pipelines that run on code changes. Use tools like GitHub Actions or Jenkins to test data and code, retrain models, and deploy updated models automatically.

📐

Syntax

A typical CI/CD pipeline for ML includes these steps:

Code and Data Validation: Check if code and data are correct.
Model Training: Train the ML model automatically.
Model Testing: Evaluate model performance on test data.
Model Packaging: Prepare the model for deployment.
Deployment: Deploy the model to production or staging.

Each step can be scripted and triggered by code changes using CI/CD tools.

yaml

name: ML CI/CD Pipeline

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run tests
        run: pytest tests/
      - name: Train model
        run: python train.py
      - name: Evaluate model
        run: python evaluate.py
      - name: Deploy model
        if: success()
        run: python deploy.py

💻

Example

This example shows a simple Python script for training and testing a model, integrated into a GitHub Actions workflow that runs on every push to the main branch.

python

# train.py
import pickle
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

iris = load_iris()
X, y = iris.data, iris.target
model = RandomForestClassifier()
model.fit(X, y)
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

# evaluate.py
import pickle
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target
with open('model.pkl', 'rb') as f:
    model = pickle.load(f)
preds = model.predict(X)
acc = accuracy_score(y, preds)
print(f"Model accuracy: {acc:.2f}")

# deploy.py
print("Deploying model... (this is a placeholder)")

Output

Model accuracy: 1.00 Deploying model... (this is a placeholder)

⚠️

Common Pitfalls

Common mistakes when setting up CI/CD for ML include:

Not versioning data and models, causing confusion about which model is deployed.
Skipping automated tests for data quality and model performance.
Deploying models without validation, leading to poor predictions in production.
Ignoring environment differences between training and deployment.

Always include data checks, model evaluation, and environment consistency in your pipeline.

yaml

## Wrong way: Deploy without testing
- name: Deploy model
  run: python deploy.py

## Right way: Deploy only if tests pass
- name: Run tests
  run: pytest tests/
- name: Deploy model
  if: success()
  run: python deploy.py

📊

Quick Reference

Tips for effective ML CI/CD:

Use version control for code, data, and models.
Automate testing for data quality and model accuracy.
Keep training and deployment environments consistent.
Use containerization (e.g., Docker) for reproducibility.
Monitor deployed models for performance drift and retrain as needed.

✅

Key Takeaways

Automate training, testing, and deployment steps using CI/CD pipelines triggered by code changes.

Always validate data and model performance before deploying to production.

Version control your code, data, and models to track changes and ensure reproducibility.

Use consistent environments and containerization to avoid deployment issues.

Monitor models after deployment to detect and fix performance drops.