0
0
MlopsHow-ToBeginner · 3 min read

How to Use joblib for Model Saving in Python

Use joblib.dump() to save a trained model to a file and joblib.load() to load it back in Python. This method efficiently stores sklearn models and other Python objects for reuse without retraining.
📐

Syntax

The basic syntax for saving a model is joblib.dump(model, filename), where model is your trained model object and filename is the path to save the file. To load the model back, use model = joblib.load(filename).

  • joblib.dump: Saves the model to disk.
  • joblib.load: Loads the saved model from disk.
  • filename: String path to the file where the model is saved or loaded from.
python
import joblib

# Save model
joblib.dump(model, 'model_filename.joblib')

# Load model
model = joblib.load('model_filename.joblib')
💻

Example

This example shows training a simple sklearn model, saving it with joblib, and loading it back to make predictions.

python
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import joblib

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X, y)

# Save model
joblib.dump(model, 'rf_iris_model.joblib')

# Load model
loaded_model = joblib.load('rf_iris_model.joblib')

# Predict with loaded model
predictions = loaded_model.predict(X[:5])
print('Predictions:', predictions)
Output
Predictions: [0 0 0 0 0]
⚠️

Common Pitfalls

Common mistakes include:

  • Saving the model before training it, which results in an untrained model saved.
  • Using inconsistent filenames when saving and loading.
  • Not having joblib installed or imported.
  • Trying to load a model file that does not exist or is corrupted.

Always ensure the model is trained before saving and use the exact filename when loading.

python
import joblib

# Wrong: saving before training
# model = RandomForestClassifier()
# joblib.dump(model, 'model.joblib')  # Model not trained yet

# Right: train first, then save
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit([[0,0],[1,1]], [0,1])
joblib.dump(model, 'model.joblib')
📊

Quick Reference

FunctionPurposeExample Usage
joblib.dumpSave model to filejoblib.dump(model, 'model.joblib')
joblib.loadLoad model from filemodel = joblib.load('model.joblib')

Key Takeaways

Use joblib.dump() to save trained sklearn models efficiently to disk.
Load saved models with joblib.load() to reuse without retraining.
Always train your model before saving it with joblib.
Keep filenames consistent between saving and loading to avoid errors.
Ensure joblib is installed and imported before using its functions.