Bird
Raised Fist0
ML Pythonml~20 mins

Saving pipelines (joblib, pickle) in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Pipeline Pro
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this pipeline saving code?

Consider the following Python code that trains a simple pipeline and saves it using joblib. What will be the output when loading and predicting with the saved pipeline?

ML Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import joblib

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression())
])

X_train = [[0, 0], [1, 1], [2, 2], [3, 3]]
y_train = [0, 0, 1, 1]

pipeline.fit(X_train, y_train)
joblib.dump(pipeline, 'model.joblib')

loaded_pipeline = joblib.load('model.joblib')
pred = loaded_pipeline.predict([[1.5, 1.5]])
print(pred[0])
A1
B0
CRaises FileNotFoundError
DRaises AttributeError
Attempts:
2 left
💡 Hint

Think about what the model predicts for input close to training samples labeled 1.

Model Choice
intermediate
1:30remaining
Which method is best for saving a scikit-learn pipeline?

You have a trained scikit-learn pipeline. Which method is recommended to save and later reload the entire pipeline with minimal hassle?

AUse <code>pickle.dump()</code> and <code>pickle.load()</code> to save and load the pipeline.
BSave the pipeline as a CSV file.
CSave only the pipeline parameters as JSON and reconstruct the pipeline manually.
DUse <code>joblib.dump()</code> and <code>joblib.load()</code> to save and load the pipeline.
Attempts:
2 left
💡 Hint

Consider which method is optimized for large numpy arrays inside models.

Hyperparameter
advanced
2:00remaining
Which hyperparameter affects pipeline saving compatibility?

When saving a scikit-learn pipeline with joblib, which hyperparameter setting in the pipeline's components can cause issues when loading the saved pipeline in a different environment?

ASetting <code>random_state</code> to a fixed integer.
BUsing custom transformer classes not defined in the loading environment.
CSetting <code>verbose=True</code> in pipeline steps.
DUsing <code>n_jobs</code> with a value other than 1.
Attempts:
2 left
💡 Hint

Think about what happens if the code that defines a custom class is missing when loading.

🔧 Debug
advanced
2:00remaining
Why does this pipeline loading code raise an error?

Given the code below, why does loading the saved pipeline raise an error?

ML Python
import joblib

loaded_pipeline = joblib.load('saved_pipeline.pkl')
pred = loaded_pipeline.predict([[0, 0]])
print(pred)
AThe file 'saved_pipeline.pkl' does not exist in the current directory.
BThe file was saved with pickle, not joblib, causing incompatibility.
CThe pipeline was saved with joblib but the file extension is incorrect.
DThe pipeline was saved with a different Python version causing incompatibility.
Attempts:
2 left
💡 Hint

Check if the file path and name are correct and the file exists.

🧠 Conceptual
expert
2:30remaining
What is a key advantage of using joblib over pickle for saving ML pipelines?

Why is joblib often preferred over pickle for saving machine learning pipelines that include large numpy arrays?

AJoblib converts pipelines to JSON format for better readability.
BJoblib encrypts the saved files for security, unlike pickle.
CJoblib compresses data automatically and supports memory mapping for faster loading of large arrays.
DJoblib saves pipelines as plain text files for easier debugging.
Attempts:
2 left
💡 Hint

Think about performance and file size when saving large data.

Practice

(1/5)
1. What is the main purpose of saving a machine learning pipeline using joblib or pickle?
easy
A. To visualize the model architecture
B. To increase the training speed of the model
C. To reuse the trained model and preprocessing steps without retraining
D. To automatically tune hyperparameters

Solution

  1. Step 1: Understand what saving a pipeline means

    Saving a pipeline stores the trained model and preprocessing steps so you don't have to train again.
  2. Step 2: Identify the main benefit

    This allows you to reuse the pipeline later for predictions without retraining, saving time and effort.
  3. Final Answer:

    To reuse the trained model and preprocessing steps without retraining -> Option C
  4. Quick Check:

    Saving pipeline = reuse trained model [OK]
Hint: Saving pipelines means reusing models without retraining [OK]
Common Mistakes:
  • Thinking saving speeds up training
  • Confusing saving with visualization
  • Assuming saving tunes hyperparameters
2. Which of the following is the correct syntax to save a trained pipeline named pipe to a file called model.pkl using joblib?
easy
A. joblib.dump(pipe, 'model.pkl')
B. joblib.store(pipe, 'model.pkl')
C. joblib.write(pipe, 'model.pkl')
D. joblib.save(pipe, 'model.pkl')

Solution

  1. Step 1: Recall the correct joblib function for saving

    The function to save an object with joblib is dump(), not save, write, or store.
  2. Step 2: Match the syntax

    The correct syntax is joblib.dump(pipe, 'model.pkl') to save the pipeline to a file.
  3. Final Answer:

    joblib.dump(pipe, 'model.pkl') -> Option A
  4. Quick Check:

    Save with joblib.dump() [OK]
Hint: Use joblib.dump() to save pipelines [OK]
Common Mistakes:
  • Using joblib.save() which does not exist
  • Confusing dump() with write() or store()
  • Incorrect argument order
3. Given the following code, what will be the output?
import joblib
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression())
])

pipe.fit([[0, 0], [1, 1]], [0, 1])
joblib.dump(pipe, 'pipe.pkl')
loaded_pipe = joblib.load('pipe.pkl')
pred = loaded_pipe.predict([[2, 2]])
print(pred)
medium
A. [0]
B. [1]
C. Error: File not found
D. [0 1]

Solution

  1. Step 1: Understand the pipeline training

    The pipeline is trained on two points: [0,0] labeled 0 and [1,1] labeled 1, so it learns to classify higher values as 1.
  2. Step 2: Predict using loaded pipeline

    After saving and loading, the pipeline predicts on [2,2], which is closer to class 1, so prediction is [1].
  3. Final Answer:

    [1] -> Option B
  4. Quick Check:

    Loaded pipeline predicts class 1 for [2,2] [OK]
Hint: Loaded pipeline predicts same as original model [OK]
Common Mistakes:
  • Expecting error due to file handling
  • Confusing prediction output format
  • Assuming prediction is [0]
4. You tried to load a saved pipeline using loaded_pipe = joblib.load('pipeline.pkl') but got a FileNotFoundError. What is the most likely cause?
medium
A. The file pipeline.pkl does not exist in the current directory
B. The pipeline was not trained before saving
C. The joblib.load function is used incorrectly
D. The pipeline file is corrupted and cannot be loaded

Solution

  1. Step 1: Understand FileNotFoundError meaning

    This error means the file specified does not exist at the given path.
  2. Step 2: Identify the most common cause

    Usually, the file is missing or the path is wrong, so the file pipeline.pkl is not found in the current directory.
  3. Final Answer:

    The file pipeline.pkl does not exist in the current directory -> Option A
  4. Quick Check:

    FileNotFoundError = missing file [OK]
Hint: FileNotFoundError means file path is wrong or missing [OK]
Common Mistakes:
  • Assuming pipeline not trained causes this error
  • Thinking joblib.load syntax is wrong
  • Assuming file corruption without checking file presence
5. You have a pipeline that includes a scaler and a classifier. You want to save it and later load it to predict on new data. Which of the following code snippets correctly saves and loads the pipeline, then predicts on new data [[5, 5]]?
hard
A. import pickle pickle.dump(pipeline, 'model.pkl') loaded = pickle.load('model.pkl') pred = loaded.predict([[5, 5]]) print(pred)
B. import pickle pickle.load(pipeline, 'model.pkl') loaded = pickle.load('model.pkl') pred = loaded.predict([[5, 5]]) print(pred)
C. import joblib joblib.save(pipeline, 'model.pkl') loaded = joblib.load('model.pkl') pred = loaded.predict([[5, 5]]) print(pred)
D. import joblib joblib.dump(pipeline, 'model.joblib') loaded = joblib.load('model.joblib') pred = loaded.predict([[5, 5]]) print(pred)

Solution

  1. Step 1: Check saving syntax correctness

    import joblib joblib.dump(pipeline, 'model.joblib') loaded = joblib.load('model.joblib') pred = loaded.predict([[5, 5]]) print(pred) uses joblib.dump() correctly to save the pipeline, and joblib.load() to load it.
  2. Step 2: Verify prediction step

    After loading, it calls predict on new data correctly and prints the result.
  3. Step 3: Identify errors in other options

    import pickle pickle.load(pipeline, 'model.pkl') loaded = pickle.load('model.pkl') pred = loaded.predict([[5, 5]]) print(pred) wrongly uses pickle.load to save; import joblib joblib.save(pipeline, 'model.pkl') loaded = joblib.load('model.pkl') pred = loaded.predict([[5, 5]]) print(pred) uses non-existent joblib.save; import pickle pickle.dump(pipeline, 'model.pkl') loaded = pickle.load('model.pkl') pred = loaded.predict([[5, 5]]) print(pred) incorrectly uses pickle.dump and pickle.load (both require file objects from open() with 'wb'/'rb' modes).
  4. Final Answer:

    import joblib joblib.dump(pipeline, 'model.joblib') loaded = joblib.load('model.joblib') pred = loaded.predict([[5, 5]]) print(pred) -> Option D
  5. Quick Check:

    Use joblib.dump/load with correct syntax [OK]
Hint: Use joblib.dump() and joblib.load() with correct syntax [OK]
Common Mistakes:
  • Using joblib.save() which does not exist
  • Confusing pickle.load() for saving
  • Not opening file when using pickle.load()