Bird
Raised Fist0
ML Pythonml~5 mins

ColumnTransformer for mixed types in ML Python - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the purpose of ColumnTransformer in machine learning?

ColumnTransformer helps apply different data transformations to different columns in a dataset. This is useful when columns have mixed types, like numbers and categories.

Click to reveal answer
beginner
How does ColumnTransformer handle numeric and categorical columns differently?

It allows you to specify separate transformers for numeric columns (like scaling) and categorical columns (like one-hot encoding) in one step.

Click to reveal answer
intermediate
Why is it better to use ColumnTransformer instead of transforming columns separately?

Using ColumnTransformer keeps the preprocessing organized, avoids mistakes, and integrates well with machine learning pipelines.

Click to reveal answer
intermediate
What happens if you don't specify remainder='passthrough' in ColumnTransformer?

Columns not listed in transformers are dropped by default. Using remainder='passthrough' keeps those columns unchanged.

Click to reveal answer
beginner
Give an example of transformers used for numeric and categorical data in ColumnTransformer.

Numeric: StandardScaler() to scale numbers.
Categorical: OneHotEncoder() to convert categories into binary columns.

Click to reveal answer
What does ColumnTransformer do?
ASplits data into training and testing sets
BTrains multiple models at once
CVisualizes data distributions
DApplies different transformations to different columns
Which transformer is commonly used for numeric columns?
AStandardScaler
BOneHotEncoder
CLabelEncoder
DCountVectorizer
What does OneHotEncoder do for categorical data?
AConverts categories into numbers 1, 2, 3...
BRemoves missing values
CCreates binary columns for each category
DScales categories between 0 and 1
If you want to keep columns not transformed by ColumnTransformer, what parameter do you use?
Aremainder='passthrough'
Bkeep='all'
Cdrop=False
Dpreserve=True
Why is using ColumnTransformer helpful in a machine learning pipeline?
AIt automatically selects the best model
BIt organizes preprocessing for mixed data types
CIt increases model accuracy by itself
DIt visualizes the data
Explain how ColumnTransformer helps when your dataset has both numeric and categorical columns.
Think about how you prepare numbers and categories differently.
You got /4 concepts.
    Describe what happens if you do not set the remainder parameter in ColumnTransformer.
    Consider what happens to columns you forget to mention.
    You got /3 concepts.

      Practice

      (1/5)
      1. What is the main purpose of using ColumnTransformer in machine learning?
      easy
      A. To train multiple models on the same dataset
      B. To apply different preprocessing steps to different columns in a dataset
      C. To visualize data distributions
      D. To split data into training and testing sets

      Solution

      1. Step 1: Understand the role of ColumnTransformer

        ColumnTransformer allows applying different transformations to different columns, such as scaling numbers and encoding text.
      2. Step 2: Compare with other options

        Training models, visualizing data, or splitting data are different tasks not handled by ColumnTransformer.
      3. Final Answer:

        To apply different preprocessing steps to different columns in a dataset -> Option B
      4. Quick Check:

        ColumnTransformer = Different preprocessing per column [OK]
      Hint: Think: Different columns, different treatments [OK]
      Common Mistakes:
      • Confusing ColumnTransformer with model training
      • Thinking it splits data instead of transforming
      • Assuming it visualizes data
      2. Which of the following is the correct way to import ColumnTransformer from scikit-learn?
      easy
      A. from sklearn.feature_extraction import ColumnTransformer
      B. from sklearn.preprocessing import ColumnTransformer
      C. from sklearn.pipeline import ColumnTransformer
      D. from sklearn.compose import ColumnTransformer

      Solution

      1. Step 1: Recall the module for ColumnTransformer

        ColumnTransformer is part of the compose module in scikit-learn.
      2. Step 2: Verify other options

        Preprocessing, pipeline, and feature_extraction modules do not contain ColumnTransformer.
      3. Final Answer:

        from sklearn.compose import ColumnTransformer -> Option D
      4. Quick Check:

        ColumnTransformer is in compose module [OK]
      Hint: Remember: compose module for combining transformers [OK]
      Common Mistakes:
      • Importing from preprocessing instead of compose
      • Confusing pipeline with compose
      • Trying to import from feature_extraction
      3. Given the code below, what will be the output of print(transformed_data)?
      from sklearn.compose import ColumnTransformer
      from sklearn.preprocessing import StandardScaler, OneHotEncoder
      import numpy as np
      
      X = np.array([[1, 'red'], [2, 'blue'], [3, 'green']])
      
      ct = ColumnTransformer([
          ('num', StandardScaler(), [0]),
          ('cat', OneHotEncoder(), [1])
      ])
      
      transformed_data = ct.fit_transform(X)
      print(transformed_data)
      medium
      A. A numpy array with scaled numbers and one-hot encoded colors
      B. A list of original values without changes
      C. An error because StandardScaler cannot handle strings
      D. A numpy array with only scaled numbers

      Solution

      1. Step 1: Understand ColumnTransformer setup

        Column 0 (numbers) is scaled; column 1 (colors) is one-hot encoded.
      2. Step 2: Predict output structure

        Output is a numpy array combining scaled numeric values and one-hot encoded categorical values.
      3. Final Answer:

        A numpy array with scaled numbers and one-hot encoded colors -> Option A
      4. Quick Check:

        Mixed types transformed correctly = scaled + one-hot [OK]
      Hint: Remember: ColumnTransformer applies each transformer to specified columns [OK]
      Common Mistakes:
      • Expecting original data without transformation
      • Thinking StandardScaler will fail on mixed data
      • Ignoring one-hot encoding effect
      4. What is wrong with the following code snippet using ColumnTransformer?
      from sklearn.compose import ColumnTransformer
      from sklearn.preprocessing import StandardScaler, OneHotEncoder
      import numpy as np
      
      X = np.array([[1, 'red'], [2, 'blue'], [3, 'green']])
      
      ct = ColumnTransformer([
          ('num', StandardScaler(), [0, 1]),
          ('cat', OneHotEncoder(), [1])
      ])
      
      transformed_data = ct.fit_transform(X)
      
      medium
      A. StandardScaler is applied to a string column causing an error
      B. OneHotEncoder is applied to a numeric column causing an error
      C. ColumnTransformer requires numeric data only
      D. No error, code runs fine

      Solution

      1. Step 1: Check columns assigned to StandardScaler

        StandardScaler is applied to columns 0 and 1, but column 1 contains strings.
      2. Step 2: Understand why this causes an error

        StandardScaler cannot process string data, so this will raise a type error.
      3. Final Answer:

        StandardScaler is applied to a string column causing an error -> Option A
      4. Quick Check:

        Scaler on strings = error [OK]
      Hint: Scaler only works on numeric columns [OK]
      Common Mistakes:
      • Applying scaler to categorical columns
      • Assuming ColumnTransformer auto-detects types
      • Ignoring column indices in transformer
      5. You have a dataset with numeric columns ['age', 'income'] and categorical columns ['city', 'gender']. You want to scale numeric columns and one-hot encode categorical columns using ColumnTransformer. Which code snippet correctly sets this up?
      hard
      A. ColumnTransformer([('num', OneHotEncoder(), ['age', 'income']), ('cat', StandardScaler(), ['city', 'gender'])])
      B. ColumnTransformer([('num', StandardScaler(), ['city', 'gender']), ('cat', OneHotEncoder(), ['age', 'income'])])
      C. ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])])
      D. ColumnTransformer([('num', StandardScaler(), ['age']), ('cat', OneHotEncoder(), ['income', 'city', 'gender'])])

      Solution

      1. Step 1: Identify correct transformers for each column type

        Numeric columns should be scaled with StandardScaler; categorical columns should be one-hot encoded.
      2. Step 2: Match columns to transformers correctly

        ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])]) assigns numeric columns to StandardScaler and categorical columns to OneHotEncoder correctly.
      3. Final Answer:

        ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])]) -> Option C
      4. Quick Check:

        Numeric scaled + categorical one-hot = ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])]) [OK]
      Hint: Match scaler to numbers, encoder to categories [OK]
      Common Mistakes:
      • Swapping transformers between numeric and categorical
      • Mixing columns in wrong transformer
      • Leaving out columns