What is Custom transformers in ML Python?

ML Pythonml~5 mins

Custom transformers in ML Python

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Custom transformers let you create your own data processing steps to prepare data exactly how you want before training a model.

When you need to clean or change data in a special way not covered by built-in tools.

When you want to reuse the same data processing steps in different projects easily.

When you want to include your data changes inside a machine learning pipeline.

When you want to keep your code organized and clear by separating data preparation from model training.

Syntax

ML Python

from sklearn.base import BaseEstimator, TransformerMixin

class MyTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, param=1):
        # initialize parameters
        self.param = param

    def fit(self, X, y=None):
        # learn from data if needed
        return self

    def transform(self, X):
        # change data and return it
        return X

Custom transformers must inherit from BaseEstimator and TransformerMixin to work well with scikit-learn pipelines.

The fit method learns from data (optional), and transform changes the data.

Examples

This transformer adds a fixed number to all values in the data.

ML Python

from sklearn.base import BaseEstimator, TransformerMixin

class AddConstantTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, constant=1):
        self.constant = constant

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X + self.constant

This transformer applies a log transform to make data less skewed.

ML Python

from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np

class LogTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return np.log1p(X)

Sample Model

This program creates a custom transformer that squares input numbers, then trains a linear regression model on the transformed data. It shows how to use the transformer inside a pipeline.

ML Python

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Custom transformer that squares the input features
class SquareTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X ** 2

# Create sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Build pipeline with custom transformer and linear regression
pipeline = Pipeline([
    ('square', SquareTransformer()),
    ('model', LinearRegression())
])

# Train model
pipeline.fit(X_train, y_train)

# Predict on test data
predictions = pipeline.predict(X_test)

# Print predictions and score
print('Predictions:', predictions)
print('Model R^2 score:', pipeline.score(X_test, y_test))

OutputSuccess

Important Notes

Always return self in the fit method to allow chaining in pipelines.

Transformers should not change the number of samples (rows) in the data.

Use pipelines to combine custom transformers with models for clean and reusable code.

Summary

Custom transformers let you create your own data processing steps.

They must have fit and transform methods and inherit from BaseEstimator and TransformerMixin.

Use them inside pipelines to keep your machine learning code organized and reusable.

Practice

(1/5)

1. What is the main purpose of creating a custom transformer in machine learning pipelines?

easy

A. To train a machine learning model directly

B. To define a reusable data processing step with fit and transform methods

C. To visualize data distributions

D. To store the final predictions of a model

Custom transformers in ML Python

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of transformers

Step 2: Identify the purpose of custom transformers

Final Answer:

Quick Check:

Solution

Step 1: Recall inheritance for custom transformers

Step 2: Match correct class definition syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand transform method behavior

Step 2: Calculate transformed data

Final Answer:

Quick Check:

Solution

Step 1: Check input type handling in transform

Step 2: Fix transform to convert input to numpy array

Final Answer:

Quick Check:

Solution

Step 1: Understand fit and transform roles

Step 2: Apply correct sequence in methods

Final Answer:

Quick Check: