Choose the best description of why we use feature engineering pipelines in machine learning operations.
Think about what happens before training a model with raw data.
Feature engineering pipelines automate and standardize how raw data is transformed into features, ensuring consistency and reproducibility.
Given the following Python code snippet, what is the output shape of X_transformed?
from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder import numpy as np X = np.array([[25, 'red'], [30, 'blue'], [22, 'green']]) preprocessor = ColumnTransformer( transformers=[ ('num', StandardScaler(), [0]), ('cat', OneHotEncoder(), [1]) ]) X_transformed = preprocessor.fit_transform(X) print(X_transformed.shape)
Count numeric and categorical features after transformation.
The numeric column is scaled to 1 feature, and the categorical column with 3 unique values is one-hot encoded to 3 features, total 4 features.
Arrange the following steps in the correct order to create a feature engineering pipeline.
Think about understanding data first, then defining transformations, then implementation, then testing.
First identify data and types, then define transformations, implement the pipeline, and finally test it.
Consider this code snippet that raises an error during fit. What is the most likely cause?
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.impute import SimpleImputer import numpy as np X = np.array([[1, 2], [np.nan, 3], [7, 6]]) pipeline = Pipeline([ ('imputer', SimpleImputer(strategy='mean')), ('scaler', StandardScaler()) ]) pipeline.fit(X)
Think about which step should handle missing values first.
Imputation must happen before scaling because StandardScaler cannot process NaN values.
Choose the best practice that helps maintain reproducibility of machine learning models when using feature engineering pipelines.
Think about how to keep track of changes and ensure the same transformations are applied later.
Version controlling pipeline code and storing artifacts ensures the exact transformations can be reproduced for future use.