Bird
Raised Fist0
MLOpsdevops~10 mins

Self-service ML platform architecture in MLOps - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Building machine learning models often requires many steps and tools. A self-service ML platform architecture helps teams easily create, train, and deploy models without needing deep technical help every time.
When data scientists want to train models without waiting for IT support
When multiple teams need to share ML tools and resources efficiently
When you want to automate model deployment to production quickly
When you want to track experiments and results in one place
When you want to reuse code and data pipelines across projects
Config File - mlflow_tracking_server.yaml
mlflow_tracking_server.yaml
apiVersion: v1
kind: Service
metadata:
  name: mlflow-tracking
  labels:
    app: mlflow
spec:
  type: ClusterIP
  ports:
    - port: 5000
      targetPort: 5000
      protocol: TCP
      name: http
  selector:
    app: mlflow
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-tracking
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow
  template:
    metadata:
      labels:
        app: mlflow
    spec:
      containers:
      - name: mlflow
        image: mlflow/mlflow:2.5.0
        ports:
        - containerPort: 5000
        command: ["mlflow", "server", "--backend-store-uri", "sqlite:///mlflow.db", "--default-artifact-root", "/mlflow/artifacts", "--host", "0.0.0.0"]
        volumeMounts:
        - name: mlflow-artifacts
          mountPath: /mlflow/artifacts
      volumes:
      - name: mlflow-artifacts
        emptyDir: {}

This Kubernetes YAML file deploys an MLflow tracking server as a self-service ML platform component.

  • Service: Exposes MLflow server inside the cluster on port 5000.
  • Deployment: Runs one replica of the MLflow server container.
  • Command: Starts MLflow server with SQLite backend and artifact storage in a temporary directory.
Commands
This command creates the MLflow tracking server deployment and service in the Kubernetes cluster so the platform is available for users.
Terminal
kubectl apply -f mlflow_tracking_server.yaml
Expected OutputExpected
deployment.apps/mlflow-tracking created service/mlflow-tracking created
Check that the MLflow tracking server pod is running and ready for use.
Terminal
kubectl get pods -l app=mlflow
Expected OutputExpected
NAME READY STATUS RESTARTS AGE mlflow-tracking-7d8f9c7f5b-abcde 1/1 Running 0 30s
-l - Filter pods by label
Create a new MLflow experiment to organize and track machine learning runs.
Terminal
mlflow experiments create --experiment-name my-first-experiment
Expected OutputExpected
Created experiment with ID 1
--experiment-name - Name the new experiment
Run a sample MLflow project from a GitHub repo with a parameter to train a model and log results automatically.
Terminal
mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.5
Expected OutputExpected
2024/06/01 12:00:00 INFO mlflow.projects: === Run (ID '1234567890abcdef') succeeded ===
-P - Set parameters for the run
Key Concept

If you remember nothing else from this pattern, remember: a self-service ML platform lets teams run, track, and share ML work easily without waiting for infrastructure help.

Code Example
MLOps
import mlflow

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("my-first-experiment")

with mlflow.start_run():
    mlflow.log_param("alpha", 0.5)
    mlflow.log_metric("rmse", 1.23)
    print("Run logged successfully")
OutputSuccess
Common Mistakes
Not exposing the MLflow server service correctly in Kubernetes
Users cannot access the tracking UI or API, so the platform is unusable
Create a Kubernetes Service with the correct port and selector to expose MLflow
Running MLflow without specifying backend store and artifact root
MLflow cannot save experiment data or artifacts, losing all tracking info
Always specify --backend-store-uri and --default-artifact-root when starting MLflow server
Not creating experiments before running MLflow projects
Runs get logged to default experiment, making organization and comparison difficult
Create named experiments to keep runs organized
Summary
Deploy an MLflow tracking server to provide a self-service ML platform.
Verify the server is running and accessible inside the cluster.
Create experiments to organize ML runs.
Run ML projects that log parameters and metrics automatically.

Practice

(1/5)
1. What is the main purpose of a self-service ML platform in an organization?
easy
A. To monitor only the hardware usage of ML servers
B. To replace data scientists with automated tools
C. To enable teams to build and deploy ML models independently and faster
D. To store large amounts of raw data without processing

Solution

  1. Step 1: Understand the role of self-service ML platforms

    These platforms are designed to help teams work faster and independently by providing tools and interfaces for ML tasks.
  2. Step 2: Compare options with this purpose

    Options A, B, and C do not focus on enabling teams to build and deploy models independently.
  3. Final Answer:

    To enable teams to build and deploy ML models independently and faster -> Option C
  4. Quick Check:

    Self-service ML platform purpose = Enable independent, faster ML work [OK]
Hint: Focus on independence and speed for ML teams [OK]
Common Mistakes:
  • Confusing data storage with platform purpose
  • Thinking it replaces data scientists
  • Assuming it only monitors hardware
2. Which component is essential in a self-service ML platform for managing model versions?
easy
A. Model registry
B. Data ingestion pipeline
C. Experiment tracking UI
D. Security gateway

Solution

  1. Step 1: Identify the component for model version management

    The model registry is designed to store and manage different versions of ML models.
  2. Step 2: Eliminate other options

    Data ingestion handles data, experiment tracking logs experiments, and security gateway manages access, none manage model versions.
  3. Final Answer:

    Model registry -> Option A
  4. Quick Check:

    Model version management = Model registry [OK]
Hint: Model versions live in the registry, not data or security parts [OK]
Common Mistakes:
  • Confusing experiment tracking with model versioning
  • Choosing data pipeline for model management
  • Mixing security with model storage
3. Given a self-service ML platform with components: UI, data pipeline, model registry, deployment, and monitoring, which sequence correctly represents the typical workflow?
medium
A. UI -> Data pipeline -> Model registry -> Deployment -> Monitoring
B. Data pipeline -> Model registry -> UI -> Deployment -> Monitoring
C. Data pipeline -> UI -> Model registry -> Deployment -> Monitoring
D. UI -> Model registry -> Data pipeline -> Deployment -> Monitoring

Solution

  1. Step 1: Understand the typical ML workflow in a self-service platform

    The user interacts with the UI first to start tasks, then data is processed, models are registered, deployed, and monitored.
  2. Step 2: Match the sequence with this logic

    UI -> Data pipeline -> Model registry -> Deployment -> Monitoring starts with UI, then data pipeline, model registry, deployment, and monitoring, which fits the workflow.
  3. Final Answer:

    UI -> Data pipeline -> Model registry -> Deployment -> Monitoring -> Option A
  4. Quick Check:

    Workflow order = UI first, then data, model, deploy, monitor [OK]
Hint: User starts at UI, then data, model, deploy, monitor [OK]
Common Mistakes:
  • Starting workflow with data pipeline instead of UI
  • Mixing order of model registry and UI
  • Placing data pipeline after deployment
4. A self-service ML platform's deployment component fails to update models after new versions are registered. What is the most likely cause?
medium
A. The data pipeline is processing data too slowly
B. The model registry is not linked to the deployment pipeline
C. The UI does not allow model version selection
D. Monitoring tools are not configured

Solution

  1. Step 1: Analyze the failure symptom

    Deployment does not update models after new versions are registered, indicating a disconnect between model registry and deployment.
  2. Step 2: Evaluate options for cause

    Slow data pipeline or UI issues won't stop deployment updates; monitoring tools affect tracking, not deployment.
  3. Final Answer:

    The model registry is not linked to the deployment pipeline -> Option B
  4. Quick Check:

    Deployment update failure = Missing link to model registry [OK]
Hint: Check if deployment connects to model registry for updates [OK]
Common Mistakes:
  • Blaming data pipeline speed for deployment issues
  • Assuming UI controls deployment updates
  • Confusing monitoring with deployment functionality
5. You want to design a self-service ML platform that allows data scientists to run experiments, register models, deploy them, and monitor performance with minimal manual steps. Which architectural feature best supports this goal?
hard
A. Relying on external tools for monitoring without integration
B. Separating data ingestion and model deployment into isolated manual workflows
C. Using a UI that only displays model metrics without deployment controls
D. Integrating experiment tracking with automated model registration and deployment pipelines

Solution

  1. Step 1: Identify the goal of minimal manual steps

    This requires automation and integration between experiment tracking, model registration, and deployment.
  2. Step 2: Evaluate architectural options

    Integrating experiment tracking with automated model registration and deployment pipelines integrates these components with automation, supporting the goal. Options B, C, and D involve manual or disconnected steps.
  3. Final Answer:

    Integrating experiment tracking with automated model registration and deployment pipelines -> Option D
  4. Quick Check:

    Automation and integration = minimal manual steps [OK]
Hint: Automation and integration reduce manual work [OK]
Common Mistakes:
  • Choosing isolated manual workflows
  • Ignoring deployment controls in UI
  • Using disconnected monitoring tools