Introduction
A pipeline helps organize steps in machine learning so everything runs smoothly and correctly.
Jump into concepts and practice - no test required
A pipeline helps organize steps in machine learning so everything runs smoothly and correctly.
from sklearn.pipeline import Pipeline pipeline = Pipeline([ ('step_name1', transformer1), ('step_name2', transformer2), ('model', estimator) ])
Each step has a name and a transformer or model.
The last step is usually the model that makes predictions.
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression pipeline = Pipeline([ ('scale', StandardScaler()), ('model', LogisticRegression()) ])
from sklearn.pipeline import Pipeline from sklearn.impute import SimpleImputer from sklearn.ensemble import RandomForestClassifier pipeline = Pipeline([ ('impute', SimpleImputer(strategy='mean')), ('model', RandomForestClassifier()) ])
This program builds a pipeline that scales iris data and trains logistic regression. It then tests and prints accuracy.
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create pipeline pipeline = Pipeline([ ('scaler', StandardScaler()), ('logreg', LogisticRegression(max_iter=200)) ]) # Train model pipeline.fit(X_train, y_train) # Predict y_pred = pipeline.predict(X_test) # Measure accuracy acc = accuracy_score(y_test, y_pred) print(f"Accuracy: {acc:.2f}")
Always name your pipeline steps clearly for easy understanding.
Use pipelines to avoid data leakage by fitting transformers only on training data.
Pipelines make it easy to try different models or preprocessing by swapping steps.
Pipelines organize machine learning steps in order.
They help avoid mistakes and save time.
Use pipelines to make your work clear and repeatable.
print(pipe.named_steps['model'].coef_) after fitting?from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
pipe = Pipeline([
('scale', StandardScaler()),
('model', LogisticRegression())
])
X = [[1, 2], [2, 3], [3, 4], [4, 5]]
y = [0, 0, 1, 1]
pipe.fit(X, y)
print(pipe.named_steps['model'].coef_)from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
pipe = Pipeline([
('scale', StandardScaler()),
('model', LogisticRegression())
])
pipe.fit(X, y)
pipe.predict(X_test)X, y, and X_test are defined correctly.