0
0
ML Pythonml~5 mins

Custom transformers in ML Python

Choose your learning style9 modes available
Introduction
Custom transformers let you create your own data processing steps to prepare data exactly how you want before training a model.
When you need to clean or change data in a special way not covered by built-in tools.
When you want to reuse the same data processing steps in different projects easily.
When you want to include your data changes inside a machine learning pipeline.
When you want to keep your code organized and clear by separating data preparation from model training.
Syntax
ML Python
from sklearn.base import BaseEstimator, TransformerMixin

class MyTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, param=1):
        # initialize parameters
        self.param = param

    def fit(self, X, y=None):
        # learn from data if needed
        return self

    def transform(self, X):
        # change data and return it
        return X
Custom transformers must inherit from BaseEstimator and TransformerMixin to work well with scikit-learn pipelines.
The fit method learns from data (optional), and transform changes the data.
Examples
This transformer adds a fixed number to all values in the data.
ML Python
from sklearn.base import BaseEstimator, TransformerMixin

class AddConstantTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, constant=1):
        self.constant = constant

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X + self.constant
This transformer applies a log transform to make data less skewed.
ML Python
from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np

class LogTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return np.log1p(X)
Sample Model
This program creates a custom transformer that squares input numbers, then trains a linear regression model on the transformed data. It shows how to use the transformer inside a pipeline.
ML Python
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Custom transformer that squares the input features
class SquareTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X ** 2

# Create sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Build pipeline with custom transformer and linear regression
pipeline = Pipeline([
    ('square', SquareTransformer()),
    ('model', LinearRegression())
])

# Train model
pipeline.fit(X_train, y_train)

# Predict on test data
predictions = pipeline.predict(X_test)

# Print predictions and score
print('Predictions:', predictions)
print('Model R^2 score:', pipeline.score(X_test, y_test))
OutputSuccess
Important Notes
Always return self in the fit method to allow chaining in pipelines.
Transformers should not change the number of samples (rows) in the data.
Use pipelines to combine custom transformers with models for clean and reusable code.
Summary
Custom transformers let you create your own data processing steps.
They must have fit and transform methods and inherit from BaseEstimator and TransformerMixin.
Use them inside pipelines to keep your machine learning code organized and reusable.