Bird
Raised Fist0
ML Pythonml~5 mins

Custom transformers in ML Python - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a custom transformer in machine learning?
A custom transformer is a user-made tool that changes data in a specific way before using it in a model. It helps prepare or clean data to make the model work better.
Click to reveal answer
beginner
Which two methods must a custom transformer implement in scikit-learn?
A custom transformer must implement fit() to learn from data and transform() to change the data based on what it learned.
Click to reveal answer
intermediate
Why use custom transformers instead of built-in ones?
Custom transformers let you handle special data or do unique changes that built-in transformers can't do. This makes your model better for your specific problem.
Click to reveal answer
intermediate
How does a custom transformer fit into a machine learning pipeline?
It acts as a step that changes data before the model sees it. This keeps data preparation organized and repeatable for training and testing.
Click to reveal answer
beginner
What is the purpose of the fit_transform() method in custom transformers?
It combines fit() and transform() in one step to learn from data and then immediately change it, saving time and code.
Click to reveal answer
What method in a custom transformer changes the data?
Afit()
Btransform()
Cpredict()
Dscore()
Why might you create a custom transformer?
ATo visualize data
BTo speed up model training by skipping data preparation
CTo handle special data transformations not covered by built-in tools
DTo replace the model's prediction step
Which scikit-learn class is commonly extended to create a custom transformer?
AClassifierMixin
BPipeline
CRegressorMixin
DBaseEstimator and TransformerMixin
What does the fit() method do in a custom transformer?
ALearns parameters from the data
BChanges the data
CMakes predictions
DEvaluates model accuracy
Where in a machine learning workflow is a custom transformer used?
ABefore the model to prepare data
BAfter the model to check results
CDuring prediction to improve speed
DTo replace the model training
Explain how to create a custom transformer in scikit-learn and why it is useful.
Think about the methods needed and the role in data preparation.
You got /4 concepts.
    Describe the role of custom transformers in a machine learning pipeline and how they improve model building.
    Consider the flow of data from raw to model-ready.
    You got /4 concepts.

      Practice

      (1/5)
      1. What is the main purpose of creating a custom transformer in machine learning pipelines?
      easy
      A. To train a machine learning model directly
      B. To define a reusable data processing step with fit and transform methods
      C. To visualize data distributions
      D. To store the final predictions of a model

      Solution

      1. Step 1: Understand the role of transformers

        Transformers process data by learning parameters in fit and applying changes in transform.
      2. Step 2: Identify the purpose of custom transformers

        Custom transformers let you create your own data processing steps reusable in pipelines.
      3. Final Answer:

        To define a reusable data processing step with fit and transform methods -> Option B
      4. Quick Check:

        Custom transformer = reusable data step [OK]
      Hint: Custom transformers handle data prep, not model training [OK]
      Common Mistakes:
      • Confusing transformers with models
      • Thinking transformers visualize data
      • Assuming transformers store predictions
      2. Which of the following is the correct way to start defining a custom transformer class in Python using scikit-learn?
      easy
      A. class MyTransformer(Pipeline):
      B. class MyTransformer(Model):
      C. class MyTransformer(BaseEstimator, TransformerMixin):
      D. def MyTransformer():

      Solution

      1. Step 1: Recall inheritance for custom transformers

        Custom transformers inherit from BaseEstimator and TransformerMixin to get fit and transform methods.
      2. Step 2: Match correct class definition syntax

        class MyTransformer(BaseEstimator, TransformerMixin): correctly shows class inheritance from BaseEstimator and TransformerMixin.
      3. Final Answer:

        class MyTransformer(BaseEstimator, TransformerMixin): -> Option C
      4. Quick Check:

        Inheritance from BaseEstimator and TransformerMixin = class MyTransformer(BaseEstimator, TransformerMixin): [OK]
      Hint: Custom transformers inherit BaseEstimator and TransformerMixin [OK]
      Common Mistakes:
      • Using Model or Pipeline as base classes
      • Defining transformer as a function
      • Missing inheritance entirely
      3. Given this custom transformer code snippet, what will print(transformed_data) output?
      from sklearn.base import BaseEstimator, TransformerMixin
      import numpy as np
      
      class AddConstant(BaseEstimator, TransformerMixin):
          def __init__(self, constant=1):
              self.constant = constant
          def fit(self, X, y=None):
              return self
          def transform(self, X):
              return X + self.constant
      
      X = np.array([[1, 2], [3, 4]])
      transformer = AddConstant(constant=5)
      transformed_data = transformer.fit_transform(X)
      print(transformed_data)
      medium
      A. [[6 7] [8 9]]
      B. [[1 2] [3 4]]
      C. [[5 5] [5 5]]
      D. Error: fit_transform method not defined

      Solution

      1. Step 1: Understand transform method behavior

        The transform method adds the constant (5) to every element in X.
      2. Step 2: Calculate transformed data

        Original X is [[1,2],[3,4]]. Adding 5 gives [[6,7],[8,9]].
      3. Final Answer:

        [[6 7] [8 9]] -> Option A
      4. Quick Check:

        Adding constant 5 to X = [[6 7] [8 9]] [OK]
      Hint: transform adds constant to all elements [OK]
      Common Mistakes:
      • Thinking fit_transform is missing
      • Forgetting to add constant
      • Confusing output with original data
      4. What is wrong with this custom transformer code?
      from sklearn.base import BaseEstimator, TransformerMixin
      
      class MultiplyTransformer(BaseEstimator, TransformerMixin):
          def __init__(self, factor=2):
              self.factor = factor
          def fit(self, X, y=None):
              return self
          def transform(self, X):
              return X * self.factor
      
      transformer = MultiplyTransformer(factor=3)
      result = transformer.transform([1, 2, 3])
      print(result)
      medium
      A. transform method should convert input to numpy array before multiplying
      B. fit method is missing a return statement
      C. factor should be a list, not an int
      D. Class should inherit from Pipeline, not BaseEstimator

      Solution

      1. Step 1: Check input type handling in transform

        Input is a list, multiplying list by int repeats list instead of element-wise multiply.
      2. Step 2: Fix transform to convert input to numpy array

        Converting input to numpy array allows element-wise multiplication as intended.
      3. Final Answer:

        transform method should convert input to numpy array before multiplying -> Option A
      4. Quick Check:

        List * int repeats list, need numpy array for element-wise multiply [OK]
      Hint: Use numpy arrays for element-wise math in transform [OK]
      Common Mistakes:
      • Assuming list * int does element-wise multiply
      • Missing return in fit method (actually present)
      • Wrong base class inheritance
      5. You want to create a custom transformer that replaces missing values in a dataset with the median of each column, then scales the data by dividing by the max value per column. Which approach correctly combines these steps in one transformer?
      hard
      A. In fit, replace missing values; in transform, compute medians and max values
      B. Use two separate transformers instead of one custom transformer
      C. Only implement transform method to do all steps without fit
      D. In fit, compute medians and max values; in transform, replace missing with medians and divide by max values

      Solution

      1. Step 1: Understand fit and transform roles

        fit calculates statistics (median, max) from training data; transform applies these to new data.
      2. Step 2: Apply correct sequence in methods

        In fit, compute medians and max values; in transform, replace missing with medians and divide by max values correctly computes medians and max in fit, then replaces missing and scales in transform.
      3. Final Answer:

        In fit, compute medians and max values; in transform, replace missing with medians and divide by max values -> Option D
      4. Quick Check:

        fit learns stats, transform applies them [OK]
      Hint: fit learns stats; transform applies them to data [OK]
      Common Mistakes:
      • Doing data replacement in fit instead of transform
      • Skipping fit method
      • Using separate transformers unnecessarily