0
0
SciPydata~5 mins

SciPy with scikit-learn pipeline

Choose your learning style9 modes available
Introduction

We use SciPy with scikit-learn pipelines to clean and prepare data step-by-step, then train a model easily. It helps keep everything organized and repeatable.

When you want to apply multiple data processing steps before training a model.
When you need to combine SciPy functions for data transformation with scikit-learn models.
When you want to avoid repeating code and make your workflow clear and simple.
When you want to test different data cleaning and modeling steps quickly.
When you want to share your data science process with others in a neat way.
Syntax
SciPy
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.linear_model import LogisticRegression
import scipy

pipeline = Pipeline([
    ('scipy_transform', FunctionTransformer(your_scipy_function)),
    ('model', LogisticRegression())
])

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

Pipeline lets you chain steps: first data changes, then model training.

FunctionTransformer wraps SciPy functions so they work inside the pipeline.

Examples
This example applies a log transform using NumPy before training a logistic regression model.
SciPy
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.linear_model import LogisticRegression
import numpy as np
import scipy.stats

def log_transform(X):
    return np.log1p(X)

pipeline = Pipeline([
    ('log', FunctionTransformer(log_transform)),
    ('model', LogisticRegression())
])
This example smooths data using a SciPy Gaussian filter before modeling.
SciPy
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.linear_model import LogisticRegression
import scipy.ndimage

def smooth_data(X):
    return scipy.ndimage.gaussian_filter(X, sigma=1)

pipeline = Pipeline([
    ('smooth', FunctionTransformer(smooth_data)),
    ('model', LogisticRegression())
])
Sample Program

This program loads iris data, normalizes features using SciPy's z-score inside a pipeline, trains logistic regression, and prints predictions.

SciPy
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
import scipy.stats

def zscore_transform(X):
    return scipy.stats.zscore(X, axis=0)

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Use only two classes for logistic regression
X = X[y != 2]
y = y[y != 2]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Create pipeline with SciPy z-score normalization and logistic regression
pipeline = Pipeline([
    ('zscore', FunctionTransformer(zscore_transform)),
    ('model', LogisticRegression())
])

# Train model
pipeline.fit(X_train, y_train)

# Predict
predictions = pipeline.predict(X_test)

# Print predictions
print(predictions)
OutputSuccess
Important Notes

FunctionTransformer expects functions that take and return arrays.

Make sure SciPy functions do not change the shape unexpectedly.

Pipelines help avoid data leakage by applying the same transformations to train and test data.

Summary

SciPy functions can be used inside scikit-learn pipelines with FunctionTransformer.

Pipelines keep data processing and modeling steps organized and repeatable.

This approach helps beginners build clean, understandable data science workflows.