Bird
Raised Fist0
ML Pythonml~8 mins

Feature union in ML Python - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Feature union
Which metric matters for Feature Union and WHY

Feature union combines different sets of features into one big set. The goal is to improve model performance by using more information. So, the main metrics to watch are those that measure how well the model predicts using these combined features. For classification, accuracy, precision, recall, and F1 score matter. For regression, mean squared error or R-squared are important. These metrics tell us if adding features helps the model learn better.

Confusion matrix example

Imagine a binary classification model using features combined by feature union. Here is a confusion matrix from test data:

      | Predicted Positive | Predicted Negative |
      |--------------------|--------------------|
      | True Positive (TP): 50 | False Positive (FP): 5 |
      | False Negative (FN): 10 | True Negative (TN): 35 |
    

Total samples = 50 + 10 + 5 + 35 = 100

From this matrix, we calculate:

  • Precision = TP / (TP + FP) = 50 / (50 + 5) = 0.91
  • Recall = TP / (TP + FN) = 50 / (50 + 10) = 0.83
  • Accuracy = (TP + TN) / Total = (50 + 35) / 100 = 0.85
  • F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.87
Precision vs Recall tradeoff with Feature Union

When combining features, sometimes the model becomes better at finding positives (higher recall) but may also make more mistakes (lower precision). For example:

  • High precision means most predicted positives are correct. Useful when false alarms are costly, like spam filters.
  • High recall means most real positives are found. Important when missing positives is bad, like disease detection.

Feature union can help balance this by adding features that improve recall without hurting precision too much. But adding too many features can also confuse the model, lowering both.

Good vs Bad metric values for Feature Union use

Good values:

  • Accuracy above baseline (better than simple model)
  • Precision and recall both above 0.8, showing balanced performance
  • F1 score close to or above 0.85, indicating good overall prediction

Bad values:

  • Accuracy close to random guess (e.g., 50% for balanced classes)
  • Precision very low (e.g., below 0.5), meaning many false positives
  • Recall very low (e.g., below 0.5), meaning many missed positives
  • F1 score low, showing poor balance between precision and recall
Common pitfalls with metrics in Feature Union
  • Accuracy paradox: High accuracy can be misleading if classes are imbalanced. Feature union might add features that help majority class only.
  • Data leakage: Combining features from future or test data can inflate metrics falsely.
  • Overfitting: Adding too many features can make the model memorize training data, causing poor test performance.
  • Ignoring metric tradeoffs: Focusing only on accuracy without checking precision and recall can hide problems.
Self-check question

Your model using feature union has 98% accuracy but only 12% recall on the positive class (e.g., fraud). Is it good for production? Why or why not?

Answer: No, it is not good. The low recall means the model misses most positive cases (fraud). Even though accuracy is high, it likely predicts most samples as negative. For fraud detection, missing fraud is very bad, so recall is more important than accuracy here.

Key Result
Feature union aims to improve model performance by combining features; key metrics like precision, recall, and F1 score reveal if this helps balance correct predictions and missed cases.

Practice

(1/5)
1. What is the main purpose of using FeatureUnion in machine learning?
easy
A. To combine multiple feature extraction methods into a single feature set
B. To split data into training and testing sets
C. To reduce the number of features by selecting the best ones
D. To train multiple models and average their predictions

Solution

  1. Step 1: Understand FeatureUnion's role

    FeatureUnion is used to combine different feature extraction methods so their outputs join into one feature set.
  2. Step 2: Compare with other options

    Splitting data, feature selection, and model averaging are different tasks not done by FeatureUnion.
  3. Final Answer:

    To combine multiple feature extraction methods into a single feature set -> Option A
  4. Quick Check:

    FeatureUnion = Combine features [OK]
Hint: FeatureUnion joins features, not data splits or models [OK]
Common Mistakes:
  • Confusing FeatureUnion with data splitting
  • Thinking it selects features instead of combining
  • Mixing it up with model ensemble methods
2. Which of the following is the correct way to create a FeatureUnion with two transformers named 'tf1' and 'tf2'?
easy
A. FeatureUnion(tf1=transformer1, tf2=transformer2)
B. FeatureUnion({'tf1': transformer1, 'tf2': transformer2})
C. FeatureUnion([('tf1', transformer1), ('tf2', transformer2)])
D. FeatureUnion(transformer1, transformer2)

Solution

  1. Step 1: Recall FeatureUnion syntax

    FeatureUnion expects a list of tuples, each tuple with a name and a transformer.
  2. Step 2: Check each option

    FeatureUnion([('tf1', transformer1), ('tf2', transformer2)]) uses a list of tuples correctly. Options B, C, and D use wrong data structures or missing list.
  3. Final Answer:

    FeatureUnion([('tf1', transformer1), ('tf2', transformer2)]) -> Option C
  4. Quick Check:

    FeatureUnion needs list of (name, transformer) tuples [OK]
Hint: Use list of (name, transformer) tuples for FeatureUnion [OK]
Common Mistakes:
  • Passing a dictionary instead of list of tuples
  • Passing transformers without names
  • Passing transformers as separate arguments
3. Given the code below, what will be the shape of X_transformed?
from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import numpy as np

X = np.array([[1, 2, 3], [4, 5, 6]])

union = FeatureUnion([
    ('scale', StandardScaler()),
    ('pca', PCA(n_components=1))
])

X_transformed = union.fit_transform(X)
medium
A. (2, 1)
B. (2, 3)
C. (2, 2)
D. (2, 4)

Solution

  1. Step 1: Analyze each transformer output

    StandardScaler keeps original shape (2 samples, 3 features) so output shape is (2,3). PCA with n_components=1 outputs (2,1).
  2. Step 2: Combine outputs with FeatureUnion

    FeatureUnion concatenates outputs horizontally: (2,3) + (2,1) = (2,4).
  3. Final Answer:

    (2, 4) -> Option D
  4. Quick Check:

    Concatenate (2,3) and (2,1) = (2,4) [OK]
Hint: FeatureUnion concatenates horizontally, sum feature counts [OK]
Common Mistakes:
  • Assuming PCA output replaces original features
  • Thinking FeatureUnion stacks vertically
  • Ignoring output shapes of individual transformers
4. You wrote this code but get an error:
union = FeatureUnion([
    ('scale', StandardScaler()),
    ('pca', PCA(n_components=3))
])

X_transformed = union.fit_transform([[1, 2], [3, 4], [5, 6]])
What is the likely cause of the error?
medium
A. PCA cannot have n_components greater than input features
B. StandardScaler requires 3D input, but input is 2D
C. FeatureUnion requires transformers to have fit_predict method
D. Input data must be a pandas DataFrame, not a list

Solution

  1. Step 1: Check input data shape

    The input X = [[1,2],[3,4],[5,6]] has shape (3, 2), meaning 2 features.
  2. Step 2: Analyze PCA configuration

    PCA(n_components=3) requests 3 components, but only 2 features are available, causing a ValueError.
  3. Final Answer:

    PCA cannot have n_components greater than input features -> Option A
  4. Quick Check:

    PCA n_components ≤ features [OK]
Hint: Check PCA n_components ≤ number of features [OK]
Common Mistakes:
  • Assuming StandardScaler needs 3D input
  • Thinking FeatureUnion needs fit_predict
  • Believing input must be DataFrame
5. You want to combine text and numeric features for a model. You have a TfidfVectorizer for text and StandardScaler for numeric data. How do you use FeatureUnion to prepare the data correctly?
hard
A. Apply TfidfVectorizer and StandardScaler separately, then add their outputs manually
B. Use FeatureUnion with transformers for text and numeric, each applied to their columns via ColumnTransformer
C. Use FeatureUnion directly on raw data without preprocessing
D. Use StandardScaler on text data and TfidfVectorizer on numeric data

Solution

  1. Step 1: Understand data types and transformers

    Text and numeric data need different preprocessing. TfidfVectorizer works on text, StandardScaler on numeric features.
  2. Step 2: Use ColumnTransformer with FeatureUnion

    Apply each transformer to correct columns using ColumnTransformer, then combine with FeatureUnion to merge features.
  3. Final Answer:

    Use FeatureUnion with transformers for text and numeric, each applied to their columns via ColumnTransformer -> Option B
  4. Quick Check:

    Separate preprocessing per data type, then combine [OK]
Hint: Preprocess each data type separately, then combine features [OK]
Common Mistakes:
  • Applying wrong transformer to wrong data type
  • Skipping column selection before FeatureUnion
  • Trying to combine raw data without preprocessing