0
0
MLOpsdevops~5 mins

Model metadata and lineage in MLOps - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you train machine learning models, you need to keep track of details like parameters, data used, and results. Model metadata and lineage help you record this information so you can understand and reproduce your models later.
When you want to know which data and code produced a specific model version
When you need to compare different model versions to pick the best one
When you want to share model details with your team for collaboration
When you want to audit model training for compliance or debugging
When you want to automate retraining by tracking dependencies
Commands
This command runs the MLflow project in the current directory, starting a training run that logs metadata and lineage automatically.
Terminal
mlflow run .
Expected OutputExpected
2024/06/01 12:00:00 INFO mlflow.projects: === Run (ID abc123def456) started === 2024/06/01 12:00:10 INFO mlflow.projects: === Run (ID abc123def456) succeeded ===
--experiment-name - Sets the experiment under which the run is logged
Starts the MLflow tracking UI so you can view model metadata, parameters, metrics, and lineage in a web browser.
Terminal
mlflow ui
Expected OutputExpected
2024/06/01 12:01:00 INFO mlflow.server: Starting MLflow UI at http://127.0.0.1:5000
--port - Specifies the port for the UI server
Shows detailed metadata and lineage information for the specific run with ID abc123def456.
Terminal
mlflow runs describe abc123def456
Expected OutputExpected
Run ID: abc123def456 Parameters: learning_rate: 0.01 epochs: 10 Metrics: accuracy: 0.92 Artifacts: model.pkl Tags: mlflow.source.name: train.py mlflow.source.git.commit: 9f8e7d6
Key Concept

If you remember nothing else from this pattern, remember: tracking model metadata and lineage lets you reproduce, compare, and trust your machine learning models.

Code Example
MLOps
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Start MLflow run
with mlflow.start_run():
    # Log parameters
    n_estimators = 100
    mlflow.log_param("n_estimators", n_estimators)

    # Train model
    model = RandomForestClassifier(n_estimators=n_estimators)
    model.fit(X_train, y_train)

    # Predict and log metric
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    mlflow.log_metric("accuracy", acc)

    # Log model artifact
    mlflow.sklearn.log_model(model, "model")

    print(f"Run completed with accuracy: {acc:.2f}")
OutputSuccess
Common Mistakes
Not logging parameters or metrics during training
Without logging, you lose important details needed to understand or reproduce the model
Use MLflow logging functions like mlflow.log_param() and mlflow.log_metric() inside your training code
Not starting the MLflow tracking server or UI
You cannot view or manage your model metadata and lineage without the UI or server running
Run 'mlflow ui' to start the tracking UI and access your runs
Summary
Use MLflow commands to run training and automatically log model metadata and lineage.
Start the MLflow UI to view and compare model runs with their parameters and metrics.
Use MLflow logging functions in your code to record parameters, metrics, and artifacts for reproducibility.