0
0
ML Pythonml~5 mins

Saving pipelines (joblib, pickle) in ML Python

Choose your learning style9 modes available
Introduction
Saving pipelines lets you keep your trained machine learning steps so you can use them later without retraining.
You want to reuse a trained model and preprocessing steps without retraining.
You need to share your trained pipeline with a teammate or deploy it in an app.
You want to save time by loading a ready-to-use pipeline instead of starting from scratch.
You want to keep a backup of your model and preprocessing for future use.
You want to test your model on new data later without repeating training.
Syntax
ML Python
import joblib

# Save pipeline
joblib.dump(pipeline, 'pipeline_filename.joblib')

# Load pipeline
pipeline = joblib.load('pipeline_filename.joblib')
Use joblib.dump() to save and joblib.load() to load pipelines easily.
You can also use pickle module, but joblib is faster for large objects.
Examples
Save the pipeline object my_pipeline to a file named model.joblib.
ML Python
import joblib

joblib.dump(my_pipeline, 'model.joblib')
Load the saved pipeline from the file model.joblib back into a variable.
ML Python
import joblib

loaded_pipeline = joblib.load('model.joblib')
Save the pipeline using pickle to a file pipeline.pkl.
ML Python
import pickle

with open('pipeline.pkl', 'wb') as f:
    pickle.dump(my_pipeline, f)
Load the pipeline saved with pickle from the file pipeline.pkl.
ML Python
import pickle

with open('pipeline.pkl', 'rb') as f:
    loaded_pipeline = pickle.load(f)
Sample Model
This program trains a pipeline with scaling and logistic regression on iris data, saves it, loads it back, and tests predictions and accuracy.
ML Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import joblib

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression(max_iter=200))
])

# Train pipeline
pipeline.fit(X_train, y_train)

# Save pipeline
joblib.dump(pipeline, 'iris_pipeline.joblib')

# Load pipeline
loaded_pipeline = joblib.load('iris_pipeline.joblib')

# Predict with loaded pipeline
predictions = loaded_pipeline.predict(X_test)

# Calculate accuracy
accuracy = loaded_pipeline.score(X_test, y_test)

print(f'Predictions: {predictions}')
print(f'Accuracy: {accuracy:.2f}')
OutputSuccess
Important Notes
Always save the entire pipeline to keep preprocessing and model together.
Use joblib for faster saving/loading especially with large numpy arrays.
Make sure to load the pipeline in the same environment with required libraries installed.
Summary
Saving pipelines lets you reuse trained models and preprocessing easily.
Use joblib.dump() and joblib.load() to save and load pipelines.
Loading a saved pipeline lets you predict on new data without retraining.