0
0
MlopsHow-ToBeginner · 3 min read

How to Use make_pipeline in sklearn with Python

Use make_pipeline from sklearn.pipeline to chain multiple data processing and modeling steps into one object. It simplifies workflows by automatically naming steps and allowing you to fit and predict with a single call.
📐

Syntax

The make_pipeline function creates a pipeline by chaining multiple transformers and an estimator. Each argument is a step, like a scaler or a model. The pipeline runs steps in order: first transformers, then the final estimator.

Example parts:

  • make_pipeline(step1, step2, ..., stepN): creates the pipeline.
  • step1, step2, ..., stepN: transformers or estimator objects.
  • Transformer steps must have fit and transform methods.
  • The final step must have fit and predict methods.
python
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipeline = make_pipeline(StandardScaler(), LogisticRegression())
💻

Example

This example shows how to create a pipeline that scales data and then fits a logistic regression model. It fits the pipeline on training data and predicts on test data.

python
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
X, y = load_iris(return_X_y=True)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Create pipeline
pipeline = make_pipeline(StandardScaler(), LogisticRegression(random_state=42))

# Fit pipeline
pipeline.fit(X_train, y_train)

# Predict
y_pred = pipeline.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Output
Accuracy: 1.00
⚠️

Common Pitfalls

Common mistakes when using make_pipeline include:

  • Not using transformers before the final estimator. The last step must be an estimator with predict.
  • Passing raw functions or objects without fit/transform methods.
  • Trying to access steps by name, which make_pipeline auto-generates and may be unclear.
  • For custom step names, use Pipeline instead of make_pipeline.

Example of wrong and right usage:

python
# Wrong: final step is a transformer without predict
from sklearn.preprocessing import StandardScaler
pipeline = make_pipeline(StandardScaler(), StandardScaler())  # No estimator at end

# Right: final step is an estimator
from sklearn.linear_model import LogisticRegression
pipeline = make_pipeline(StandardScaler(), LogisticRegression())
📊

Quick Reference

Tips for using make_pipeline:

  • Use it to quickly chain preprocessing and modeling steps.
  • Steps are named automatically by their class names in lowercase.
  • Use pipeline.fit(X, y) to train all steps at once.
  • Use pipeline.predict(X) to get predictions from the final estimator.
  • For custom step names or more control, use Pipeline instead.

Key Takeaways

Use make_pipeline to chain transformers and an estimator into one object for easy training and prediction.
The final step in make_pipeline must be an estimator with predict method.
make_pipeline auto-names steps based on class names; use Pipeline for custom names.
Fit and predict on the pipeline just like a single model.
Common errors include missing an estimator at the end or passing incompatible objects.