Imagine you want to combine different sets of features extracted from the same data before training a model. What is the main purpose of using FeatureUnion in this context?
Think about how you can use different ways to get features and then join them together before training.
FeatureUnion takes multiple feature extraction steps and concatenates their outputs side-by-side, creating a combined feature set for the model to learn from.
Given the following code, what is the shape of X_transformed?
from sklearn.pipeline import FeatureUnion from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA import numpy as np X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) fu = FeatureUnion([ ('scale', StandardScaler()), ('pca', PCA(n_components=1)) ]) X_transformed = fu.fit_transform(X)
Consider the output dimensions of each transformer and how FeatureUnion combines them.
The input X has shape (3, 3). StandardScaler outputs shape (3, 3), while PCA(n_components=1) outputs shape (3, 1). FeatureUnion concatenates these outputs along the feature axis, resulting in shape (3, 4).
You want to build a FeatureUnion that combines text features and numeric features for a classification task. Which combination of transformers is most appropriate?
Think about which transformers are designed for text and which for numeric data.
TfidfVectorizer converts text into numeric features based on word importance. StandardScaler normalizes numeric features. This combination is suitable for combining text and numeric data.
What is the effect of setting n_jobs=-1 in a FeatureUnion?
Consider what n_jobs=-1 usually means in scikit-learn.
Setting n_jobs=-1 tells scikit-learn to use all CPU cores to run transformers in parallel, speeding up the feature extraction process.
Consider this code snippet:
from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6]])
fu = FeatureUnion([
('scale', StandardScaler()),
('pca', PCA(n_components=3))
])
X_transformed = fu.fit_transform(X)Why does this code raise a ValueError?
Check the shape of X and the n_components parameter of PCA.
PCA cannot extract more components than the number of features in the input data. Here, X has 2 features, but PCA is set to extract 3 components, causing a ValueError.