We use SciPy with scikit-learn pipelines to clean and prepare data step-by-step, then train a model easily. It helps keep everything organized and repeatable.
SciPy with scikit-learn pipeline
from sklearn.pipeline import Pipeline from sklearn.preprocessing import FunctionTransformer from sklearn.linear_model import LogisticRegression import scipy pipeline = Pipeline([ ('scipy_transform', FunctionTransformer(your_scipy_function)), ('model', LogisticRegression()) ]) pipeline.fit(X_train, y_train) predictions = pipeline.predict(X_test)
Pipeline lets you chain steps: first data changes, then model training.
FunctionTransformer wraps SciPy functions so they work inside the pipeline.
from sklearn.pipeline import Pipeline from sklearn.preprocessing import FunctionTransformer from sklearn.linear_model import LogisticRegression import numpy as np import scipy.stats def log_transform(X): return np.log1p(X) pipeline = Pipeline([ ('log', FunctionTransformer(log_transform)), ('model', LogisticRegression()) ])
from sklearn.pipeline import Pipeline from sklearn.preprocessing import FunctionTransformer from sklearn.linear_model import LogisticRegression import scipy.ndimage def smooth_data(X): return scipy.ndimage.gaussian_filter(X, sigma=1) pipeline = Pipeline([ ('smooth', FunctionTransformer(smooth_data)), ('model', LogisticRegression()) ])
This program loads iris data, normalizes features using SciPy's z-score inside a pipeline, trains logistic regression, and prints predictions.
from sklearn.pipeline import Pipeline from sklearn.preprocessing import FunctionTransformer from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import numpy as np import scipy.stats def zscore_transform(X): return scipy.stats.zscore(X, axis=0) # Load data iris = load_iris() X, y = iris.data, iris.target # Use only two classes for logistic regression X = X[y != 2] y = y[y != 2] # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Create pipeline with SciPy z-score normalization and logistic regression pipeline = Pipeline([ ('zscore', FunctionTransformer(zscore_transform)), ('model', LogisticRegression()) ]) # Train model pipeline.fit(X_train, y_train) # Predict predictions = pipeline.predict(X_test) # Print predictions print(predictions)
FunctionTransformer expects functions that take and return arrays.
Make sure SciPy functions do not change the shape unexpectedly.
Pipelines help avoid data leakage by applying the same transformations to train and test data.
SciPy functions can be used inside scikit-learn pipelines with FunctionTransformer.
Pipelines keep data processing and modeling steps organized and repeatable.
This approach helps beginners build clean, understandable data science workflows.