Challenge - 5 Problems
ColumnTransformer Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of ColumnTransformer with numeric and categorical features
What is the shape of the transformed data after applying this ColumnTransformer?
ML Python
from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder import numpy as np X = np.array([[1, 'red'], [2, 'blue'], [3, 'green']]) ct = ColumnTransformer( transformers=[ ('num', StandardScaler(), [0]), ('cat', OneHotEncoder(), [1]) ] ) X_transformed = ct.fit_transform(X) print(X_transformed.shape)
Attempts:
2 left
💡 Hint
Remember that OneHotEncoder creates one column per unique category.
✗ Incorrect
The numeric column is scaled to 1 feature. The categorical column has 3 unique values, so OneHotEncoder creates 3 columns. Total columns = 1 + 3 = 4. Rows remain 3.
❓ Model Choice
intermediate2:00remaining
Choosing transformers for mixed data types
You have a dataset with numeric, categorical, and text columns. Which set of transformers is best to use in a ColumnTransformer?
Attempts:
2 left
💡 Hint
Text data needs vectorization, categorical data needs encoding, numeric data needs scaling.
✗ Incorrect
StandardScaler is common for numeric data. OneHotEncoder is suitable for categorical data to avoid ordinal assumptions. TfidfVectorizer is a good choice for text data to capture word importance.
❓ Hyperparameter
advanced2:00remaining
Effect of 'remainder' parameter in ColumnTransformer
What happens if you set remainder='passthrough' in a ColumnTransformer?
Attempts:
2 left
💡 Hint
Think about how to keep columns unchanged in the output.
✗ Incorrect
Setting remainder='passthrough' keeps columns not listed in transformers unchanged and includes them in the output.
🔧 Debug
advanced2:00remaining
Error when fitting ColumnTransformer with incompatible data types
What error will this code raise and why?
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import numpy as np
X = np.array([[1, 'red'], [2, 'blue'], [3, 4]])
ct = ColumnTransformer(
transformers=[
('num', StandardScaler(), [0]),
('cat', OneHotEncoder(), [1])
]
)
ct.fit(X)
ML Python
from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder import numpy as np X = np.array([[1, 'red'], [2, 'blue'], [3, 4]]) ct = ColumnTransformer( transformers=[ ('num', StandardScaler(), [0]), ('cat', OneHotEncoder(), [1]) ] ) ct.fit(X)
Attempts:
2 left
💡 Hint
Check the data types in the categorical column.
✗ Incorrect
Column 1 has mixed types: strings ('red', 'blue') and an integer (4). OneHotEncoder expects consistent types and raises TypeError.
🧠 Conceptual
expert2:00remaining
Why use ColumnTransformer in a machine learning pipeline?
Which is the best explanation for why ColumnTransformer is useful when working with mixed data types?
Attempts:
2 left
💡 Hint
Think about handling numeric and categorical data differently.
✗ Incorrect
ColumnTransformer lets you apply different transformations to different columns in a single step, making preprocessing easier and cleaner.