0
0
ML Pythonml~20 mins

Saving pipelines (joblib, pickle) in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Saving pipelines (joblib, pickle)
Problem:You have trained a machine learning pipeline that preprocesses data and fits a model. You want to save this pipeline to disk so you can reuse it later without retraining.
Current Metrics:Training accuracy: 92%, Validation accuracy: 89%
Issue:Currently, the pipeline is only in memory. If you close your program, you lose the trained pipeline and must retrain it every time.
Your Task
Save the trained pipeline to disk using joblib or pickle, then load it back and verify it produces the same predictions on test data.
Use either joblib or pickle for saving and loading.
Do not retrain the model after loading.
Ensure the loaded pipeline gives identical predictions to the original.
Hint 1
Hint 2
Hint 3
Solution
ML Python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import joblib

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression(random_state=42))
])

# Train pipeline
pipeline.fit(X_train, y_train)

# Save pipeline to disk
joblib.dump(pipeline, 'iris_pipeline.joblib')

# Load pipeline from disk
loaded_pipeline = joblib.load('iris_pipeline.joblib')

# Predict with original and loaded pipeline
original_preds = pipeline.predict(X_test)
loaded_preds = loaded_pipeline.predict(X_test)

# Check if predictions are the same
predictions_match = np.array_equal(original_preds, loaded_preds)

print(f'Predictions match after loading: {predictions_match}')
Added code to save the trained pipeline using joblib.dump() to a file named 'iris_pipeline.joblib'.
Added code to load the pipeline back using joblib.load() from the saved file.
Compared predictions from the original and loaded pipeline to verify they match.
Results Interpretation

Before saving: The pipeline existed only in memory and would be lost after program ends.

After saving and loading: The pipeline is stored on disk and can be reloaded to make predictions without retraining.

This is confirmed by identical predictions on test data before and after loading.

Saving and loading pipelines with joblib or pickle allows you to reuse trained models easily, saving time and ensuring consistent results.
Bonus Experiment
Try saving and loading the pipeline using pickle instead of joblib. Compare file sizes and loading speed.
💡 Hint
Use pickle.dump() and pickle.load() with open files in binary mode. Joblib is often faster and produces smaller files for large numpy arrays.