Bird
Raised Fist0
ML Pythonml~20 mins

ColumnTransformer for mixed types in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
ColumnTransformer Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of ColumnTransformer with numeric and categorical features
What is the shape of the transformed data after applying this ColumnTransformer?
ML Python
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import numpy as np

X = np.array([[1, 'red'], [2, 'blue'], [3, 'green']])

ct = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), [0]),
        ('cat', OneHotEncoder(), [1])
    ]
)

X_transformed = ct.fit_transform(X)
print(X_transformed.shape)
A(3, 4)
B(3, 3)
C(3, 5)
D(3, 2)
Attempts:
2 left
💡 Hint
Remember that OneHotEncoder creates one column per unique category.
Model Choice
intermediate
2:00remaining
Choosing transformers for mixed data types
You have a dataset with numeric, categorical, and text columns. Which set of transformers is best to use in a ColumnTransformer?
ARobustScaler for numeric, OrdinalEncoder for categorical, HashingVectorizer for text
BStandardScaler for numeric, OneHotEncoder for categorical, TfidfVectorizer for text
CMinMaxScaler for numeric, LabelEncoder for categorical, CountVectorizer for text
DNormalizer for numeric, OneHotEncoder for categorical, LabelEncoder for text
Attempts:
2 left
💡 Hint
Text data needs vectorization, categorical data needs encoding, numeric data needs scaling.
Hyperparameter
advanced
2:00remaining
Effect of 'remainder' parameter in ColumnTransformer
What happens if you set remainder='passthrough' in a ColumnTransformer?
AAn error is raised if columns are missing in transformers
BColumns not specified are dropped from the output
CAll columns are transformed regardless of specification
DColumns not specified in transformers are passed through without changes
Attempts:
2 left
💡 Hint
Think about how to keep columns unchanged in the output.
🔧 Debug
advanced
2:00remaining
Error when fitting ColumnTransformer with incompatible data types
What error will this code raise and why? from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder import numpy as np X = np.array([[1, 'red'], [2, 'blue'], [3, 4]]) ct = ColumnTransformer( transformers=[ ('num', StandardScaler(), [0]), ('cat', OneHotEncoder(), [1]) ] ) ct.fit(X)
ML Python
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import numpy as np

X = np.array([[1, 'red'], [2, 'blue'], [3, 4]])

ct = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), [0]),
        ('cat', OneHotEncoder(), [1])
    ]
)

ct.fit(X)
ATypeError because column 1 has mixed types (strings and integers)
BTypeError because StandardScaler cannot process strings
CNo error, the code runs successfully
DIndexError because column 1 does not exist
Attempts:
2 left
💡 Hint
Check the data types in the categorical column.
🧠 Conceptual
expert
2:00remaining
Why use ColumnTransformer in a machine learning pipeline?
Which is the best explanation for why ColumnTransformer is useful when working with mixed data types?
AIt replaces the need for feature selection by removing irrelevant columns
BIt automatically detects data types and chooses the best model
CIt allows applying different preprocessing steps to different columns in one step, simplifying pipelines
DIt converts all data to numeric format without any user input
Attempts:
2 left
💡 Hint
Think about handling numeric and categorical data differently.

Practice

(1/5)
1. What is the main purpose of using ColumnTransformer in machine learning?
easy
A. To train multiple models on the same dataset
B. To apply different preprocessing steps to different columns in a dataset
C. To visualize data distributions
D. To split data into training and testing sets

Solution

  1. Step 1: Understand the role of ColumnTransformer

    ColumnTransformer allows applying different transformations to different columns, such as scaling numbers and encoding text.
  2. Step 2: Compare with other options

    Training models, visualizing data, or splitting data are different tasks not handled by ColumnTransformer.
  3. Final Answer:

    To apply different preprocessing steps to different columns in a dataset -> Option B
  4. Quick Check:

    ColumnTransformer = Different preprocessing per column [OK]
Hint: Think: Different columns, different treatments [OK]
Common Mistakes:
  • Confusing ColumnTransformer with model training
  • Thinking it splits data instead of transforming
  • Assuming it visualizes data
2. Which of the following is the correct way to import ColumnTransformer from scikit-learn?
easy
A. from sklearn.feature_extraction import ColumnTransformer
B. from sklearn.preprocessing import ColumnTransformer
C. from sklearn.pipeline import ColumnTransformer
D. from sklearn.compose import ColumnTransformer

Solution

  1. Step 1: Recall the module for ColumnTransformer

    ColumnTransformer is part of the compose module in scikit-learn.
  2. Step 2: Verify other options

    Preprocessing, pipeline, and feature_extraction modules do not contain ColumnTransformer.
  3. Final Answer:

    from sklearn.compose import ColumnTransformer -> Option D
  4. Quick Check:

    ColumnTransformer is in compose module [OK]
Hint: Remember: compose module for combining transformers [OK]
Common Mistakes:
  • Importing from preprocessing instead of compose
  • Confusing pipeline with compose
  • Trying to import from feature_extraction
3. Given the code below, what will be the output of print(transformed_data)?
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import numpy as np

X = np.array([[1, 'red'], [2, 'blue'], [3, 'green']])

ct = ColumnTransformer([
    ('num', StandardScaler(), [0]),
    ('cat', OneHotEncoder(), [1])
])

transformed_data = ct.fit_transform(X)
print(transformed_data)
medium
A. A numpy array with scaled numbers and one-hot encoded colors
B. A list of original values without changes
C. An error because StandardScaler cannot handle strings
D. A numpy array with only scaled numbers

Solution

  1. Step 1: Understand ColumnTransformer setup

    Column 0 (numbers) is scaled; column 1 (colors) is one-hot encoded.
  2. Step 2: Predict output structure

    Output is a numpy array combining scaled numeric values and one-hot encoded categorical values.
  3. Final Answer:

    A numpy array with scaled numbers and one-hot encoded colors -> Option A
  4. Quick Check:

    Mixed types transformed correctly = scaled + one-hot [OK]
Hint: Remember: ColumnTransformer applies each transformer to specified columns [OK]
Common Mistakes:
  • Expecting original data without transformation
  • Thinking StandardScaler will fail on mixed data
  • Ignoring one-hot encoding effect
4. What is wrong with the following code snippet using ColumnTransformer?
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import numpy as np

X = np.array([[1, 'red'], [2, 'blue'], [3, 'green']])

ct = ColumnTransformer([
    ('num', StandardScaler(), [0, 1]),
    ('cat', OneHotEncoder(), [1])
])

transformed_data = ct.fit_transform(X)
medium
A. StandardScaler is applied to a string column causing an error
B. OneHotEncoder is applied to a numeric column causing an error
C. ColumnTransformer requires numeric data only
D. No error, code runs fine

Solution

  1. Step 1: Check columns assigned to StandardScaler

    StandardScaler is applied to columns 0 and 1, but column 1 contains strings.
  2. Step 2: Understand why this causes an error

    StandardScaler cannot process string data, so this will raise a type error.
  3. Final Answer:

    StandardScaler is applied to a string column causing an error -> Option A
  4. Quick Check:

    Scaler on strings = error [OK]
Hint: Scaler only works on numeric columns [OK]
Common Mistakes:
  • Applying scaler to categorical columns
  • Assuming ColumnTransformer auto-detects types
  • Ignoring column indices in transformer
5. You have a dataset with numeric columns ['age', 'income'] and categorical columns ['city', 'gender']. You want to scale numeric columns and one-hot encode categorical columns using ColumnTransformer. Which code snippet correctly sets this up?
hard
A. ColumnTransformer([('num', OneHotEncoder(), ['age', 'income']), ('cat', StandardScaler(), ['city', 'gender'])])
B. ColumnTransformer([('num', StandardScaler(), ['city', 'gender']), ('cat', OneHotEncoder(), ['age', 'income'])])
C. ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])])
D. ColumnTransformer([('num', StandardScaler(), ['age']), ('cat', OneHotEncoder(), ['income', 'city', 'gender'])])

Solution

  1. Step 1: Identify correct transformers for each column type

    Numeric columns should be scaled with StandardScaler; categorical columns should be one-hot encoded.
  2. Step 2: Match columns to transformers correctly

    ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])]) assigns numeric columns to StandardScaler and categorical columns to OneHotEncoder correctly.
  3. Final Answer:

    ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])]) -> Option C
  4. Quick Check:

    Numeric scaled + categorical one-hot = ColumnTransformer([('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city', 'gender'])]) [OK]
Hint: Match scaler to numbers, encoder to categories [OK]
Common Mistakes:
  • Swapping transformers between numeric and categorical
  • Mixing columns in wrong transformer
  • Leaving out columns