Feature union combines different sets of features into one big set. The goal is to improve model performance by using more information. So, the main metrics to watch are those that measure how well the model predicts using these combined features. For classification, accuracy, precision, recall, and F1 score matter. For regression, mean squared error or R-squared are important. These metrics tell us if adding features helps the model learn better.
Feature union in ML Python - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Imagine a binary classification model using features combined by feature union. Here is a confusion matrix from test data:
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP): 50 | False Positive (FP): 5 |
| False Negative (FN): 10 | True Negative (TN): 35 |
Total samples = 50 + 10 + 5 + 35 = 100
From this matrix, we calculate:
- Precision = TP / (TP + FP) = 50 / (50 + 5) = 0.91
- Recall = TP / (TP + FN) = 50 / (50 + 10) = 0.83
- Accuracy = (TP + TN) / Total = (50 + 35) / 100 = 0.85
- F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.87
When combining features, sometimes the model becomes better at finding positives (higher recall) but may also make more mistakes (lower precision). For example:
- High precision means most predicted positives are correct. Useful when false alarms are costly, like spam filters.
- High recall means most real positives are found. Important when missing positives is bad, like disease detection.
Feature union can help balance this by adding features that improve recall without hurting precision too much. But adding too many features can also confuse the model, lowering both.
Good values:
- Accuracy above baseline (better than simple model)
- Precision and recall both above 0.8, showing balanced performance
- F1 score close to or above 0.85, indicating good overall prediction
Bad values:
- Accuracy close to random guess (e.g., 50% for balanced classes)
- Precision very low (e.g., below 0.5), meaning many false positives
- Recall very low (e.g., below 0.5), meaning many missed positives
- F1 score low, showing poor balance between precision and recall
- Accuracy paradox: High accuracy can be misleading if classes are imbalanced. Feature union might add features that help majority class only.
- Data leakage: Combining features from future or test data can inflate metrics falsely.
- Overfitting: Adding too many features can make the model memorize training data, causing poor test performance.
- Ignoring metric tradeoffs: Focusing only on accuracy without checking precision and recall can hide problems.
Your model using feature union has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means the model misses most positive cases (fraud). Even though accuracy is high, it likely predicts most samples as negative. For fraud detection, missing fraud is very bad, so recall is more important than accuracy here.
Practice
FeatureUnion in machine learning?Solution
Step 1: Understand FeatureUnion's role
FeatureUnion is used to combine different feature extraction methods so their outputs join into one feature set.Step 2: Compare with other options
Splitting data, feature selection, and model averaging are different tasks not done by FeatureUnion.Final Answer:
To combine multiple feature extraction methods into a single feature set -> Option AQuick Check:
FeatureUnion = Combine features [OK]
- Confusing FeatureUnion with data splitting
- Thinking it selects features instead of combining
- Mixing it up with model ensemble methods
FeatureUnion with two transformers named 'tf1' and 'tf2'?Solution
Step 1: Recall FeatureUnion syntax
FeatureUnion expects a list of tuples, each tuple with a name and a transformer.Step 2: Check each option
FeatureUnion([('tf1', transformer1), ('tf2', transformer2)]) uses a list of tuples correctly. Options B, C, and D use wrong data structures or missing list.Final Answer:
FeatureUnion([('tf1', transformer1), ('tf2', transformer2)]) -> Option CQuick Check:
FeatureUnion needs list of (name, transformer) tuples [OK]
- Passing a dictionary instead of list of tuples
- Passing transformers without names
- Passing transformers as separate arguments
X_transformed?
from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import numpy as np
X = np.array([[1, 2, 3], [4, 5, 6]])
union = FeatureUnion([
('scale', StandardScaler()),
('pca', PCA(n_components=1))
])
X_transformed = union.fit_transform(X)Solution
Step 1: Analyze each transformer output
StandardScaler keeps original shape (2 samples, 3 features) so output shape is (2,3). PCA with n_components=1 outputs (2,1).Step 2: Combine outputs with FeatureUnion
FeatureUnion concatenates outputs horizontally: (2,3) + (2,1) = (2,4).Final Answer:
(2, 4) -> Option DQuick Check:
Concatenate (2,3) and (2,1) = (2,4) [OK]
- Assuming PCA output replaces original features
- Thinking FeatureUnion stacks vertically
- Ignoring output shapes of individual transformers
union = FeatureUnion([
('scale', StandardScaler()),
('pca', PCA(n_components=3))
])
X_transformed = union.fit_transform([[1, 2], [3, 4], [5, 6]])
What is the likely cause of the error?Solution
Step 1: Check input data shape
The input X = [[1,2],[3,4],[5,6]] has shape (3, 2), meaning 2 features.Step 2: Analyze PCA configuration
PCA(n_components=3) requests 3 components, but only 2 features are available, causing a ValueError.Final Answer:
PCA cannot have n_components greater than input features -> Option AQuick Check:
PCA n_components ≤ features [OK]
- Assuming StandardScaler needs 3D input
- Thinking FeatureUnion needs fit_predict
- Believing input must be DataFrame
TfidfVectorizer for text and StandardScaler for numeric data. How do you use FeatureUnion to prepare the data correctly?Solution
Step 1: Understand data types and transformers
Text and numeric data need different preprocessing. TfidfVectorizer works on text, StandardScaler on numeric features.Step 2: Use ColumnTransformer with FeatureUnion
Apply each transformer to correct columns using ColumnTransformer, then combine with FeatureUnion to merge features.Final Answer:
Use FeatureUnion with transformers for text and numeric, each applied to their columns via ColumnTransformer -> Option BQuick Check:
Separate preprocessing per data type, then combine [OK]
- Applying wrong transformer to wrong data type
- Skipping column selection before FeatureUnion
- Trying to combine raw data without preprocessing
