How to Use MLflow for Tracking Machine Learning Experiments
Use
mlflow.start_run() to begin tracking an experiment, then log parameters with mlflow.log_param(), metrics with mlflow.log_metric(), and models with mlflow.sklearn.log_model(). This helps you keep track of your model training details and results in one place.Syntax
MLflow tracking uses a simple pattern to log your machine learning experiments:
mlflow.start_run(): Starts a new experiment run.mlflow.log_param(key, value): Logs a parameter like learning rate or number of trees.mlflow.log_metric(key, value): Logs a metric like accuracy or loss.mlflow.sklearn.log_model(model, name): Saves the trained model for later use.mlflow.end_run(): Ends the current run (optional, auto-ended on exit).
python
import mlflow with mlflow.start_run(): mlflow.log_param("param1", 5) mlflow.log_metric("accuracy", 0.85) # model training and logging here # mlflow.sklearn.log_model(model, "model")
Example
This example shows how to track a simple scikit-learn model training with MLflow. It logs parameters, metrics, and the model itself.
python
import mlflow import mlflow.sklearn from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load data iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42) # Start MLflow run with mlflow.start_run(): # Define and train model n_estimators = 100 model = RandomForestClassifier(n_estimators=n_estimators, random_state=42) model.fit(X_train, y_train) # Predict and calculate accuracy preds = model.predict(X_test) acc = accuracy_score(y_test, preds) # Log parameters and metrics mlflow.log_param("n_estimators", n_estimators) mlflow.log_metric("accuracy", acc) # Log the model mlflow.sklearn.log_model(model, "random_forest_model") print(f"Logged model with accuracy: {acc:.4f}")
Output
Logged model with accuracy: 1.0000
Common Pitfalls
Common mistakes when using MLflow tracking include:
- Not using
mlflow.start_run()which causes logs to be ignored. - Logging parameters or metrics outside the run context.
- Forgetting to log the model after training.
- Overwriting runs by not managing run IDs or experiment names.
Always use with mlflow.start_run(): to ensure logs are saved properly.
python
import mlflow # Wrong way: logging outside a run # mlflow.log_param("param", 10) # This will raise an error # Right way: with mlflow.start_run(): mlflow.log_param("param", 10)
Output
Traceback (most recent call last):
File "example.py", line 4, in <module>
mlflow.log_param("param", 10)
File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 456, in log_param
_get_active_run_or_raise().log_param(key, value)
File "/usr/local/lib/python3.8/site-packages/mlflow/tracking/fluent.py", line 222, in _get_active_run_or_raise
raise MlflowException("No active run")
mlflow.exceptions.MlflowException: No active run
Quick Reference
Here is a quick summary of MLflow tracking commands:
| Command | Description |
|---|---|
| mlflow.start_run() | Start a new experiment run context |
| mlflow.log_param(key, value) | Log a parameter (e.g., hyperparameter) |
| mlflow.log_metric(key, value) | Log a metric (e.g., accuracy) |
| mlflow.sklearn.log_model(model, name) | Save a trained scikit-learn model |
| mlflow.end_run() | End the current run (optional) |
Key Takeaways
Always use mlflow.start_run() to begin tracking an experiment run.
Log parameters and metrics inside the run context to save them properly.
Use mlflow.sklearn.log_model() to save your trained model for later use.
Avoid logging outside a run context to prevent errors.
MLflow helps organize and compare multiple experiment runs easily.