0
0
ML Pythonml~20 mins

Saving pipelines (joblib, pickle) in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Pipeline Pro
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this pipeline saving code?

Consider the following Python code that trains a simple pipeline and saves it using joblib. What will be the output when loading and predicting with the saved pipeline?

ML Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import joblib

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression())
])

X_train = [[0, 0], [1, 1], [2, 2], [3, 3]]
y_train = [0, 0, 1, 1]

pipeline.fit(X_train, y_train)
joblib.dump(pipeline, 'model.joblib')

loaded_pipeline = joblib.load('model.joblib')
pred = loaded_pipeline.predict([[1.5, 1.5]])
print(pred[0])
A1
B0
CRaises FileNotFoundError
DRaises AttributeError
Attempts:
2 left
💡 Hint

Think about what the model predicts for input close to training samples labeled 1.

Model Choice
intermediate
1:30remaining
Which method is best for saving a scikit-learn pipeline?

You have a trained scikit-learn pipeline. Which method is recommended to save and later reload the entire pipeline with minimal hassle?

AUse <code>pickle.dump()</code> and <code>pickle.load()</code> to save and load the pipeline.
BSave the pipeline as a CSV file.
CSave only the pipeline parameters as JSON and reconstruct the pipeline manually.
DUse <code>joblib.dump()</code> and <code>joblib.load()</code> to save and load the pipeline.
Attempts:
2 left
💡 Hint

Consider which method is optimized for large numpy arrays inside models.

Hyperparameter
advanced
2:00remaining
Which hyperparameter affects pipeline saving compatibility?

When saving a scikit-learn pipeline with joblib, which hyperparameter setting in the pipeline's components can cause issues when loading the saved pipeline in a different environment?

ASetting <code>random_state</code> to a fixed integer.
BUsing custom transformer classes not defined in the loading environment.
CSetting <code>verbose=True</code> in pipeline steps.
DUsing <code>n_jobs</code> with a value other than 1.
Attempts:
2 left
💡 Hint

Think about what happens if the code that defines a custom class is missing when loading.

🔧 Debug
advanced
2:00remaining
Why does this pipeline loading code raise an error?

Given the code below, why does loading the saved pipeline raise an error?

ML Python
import joblib

loaded_pipeline = joblib.load('saved_pipeline.pkl')
pred = loaded_pipeline.predict([[0, 0]])
print(pred)
AThe file 'saved_pipeline.pkl' does not exist in the current directory.
BThe file was saved with pickle, not joblib, causing incompatibility.
CThe pipeline was saved with joblib but the file extension is incorrect.
DThe pipeline was saved with a different Python version causing incompatibility.
Attempts:
2 left
💡 Hint

Check if the file path and name are correct and the file exists.

🧠 Conceptual
expert
2:30remaining
What is a key advantage of using joblib over pickle for saving ML pipelines?

Why is joblib often preferred over pickle for saving machine learning pipelines that include large numpy arrays?

AJoblib converts pipelines to JSON format for better readability.
BJoblib encrypts the saved files for security, unlike pickle.
CJoblib compresses data automatically and supports memory mapping for faster loading of large arrays.
DJoblib saves pipelines as plain text files for easier debugging.
Attempts:
2 left
💡 Hint

Think about performance and file size when saving large data.