How to use FeatureUnion sklearn in python

MlopsHow-ToBeginner · 4 min read

How to Use FeatureUnion in sklearn with Python: Simple Guide

Use FeatureUnion in sklearn to combine multiple feature extraction pipelines into one. It applies each transformer in parallel and concatenates their outputs, allowing you to merge different feature sets before feeding them to a model.

📐

Syntax

The basic syntax of FeatureUnion is:

FeatureUnion(transformer_list, n_jobs=None, transformer_weights=None, verbose=False)

Here:

transformer_list is a list of tuples with a name and a transformer (like pipelines or feature extractors).
n_jobs controls parallel processing (None means 1 job).
transformer_weights lets you assign weights to each transformer’s output.
verbose shows progress if set to True.

python

from sklearn.pipeline import FeatureUnion

feature_union = FeatureUnion([
    ('transformer1', transformer1),
    ('transformer2', transformer2)
], n_jobs=1, transformer_weights=None, verbose=False)

💻

Example

This example shows how to combine two simple feature extractors: one that selects numeric columns and one that selects categorical columns, then applies different transformations to each. The outputs are joined into one feature set.

python

from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.datasets import fetch_openml
from sklearn.impute import SimpleImputer
import numpy as np

# Load sample data
X, y = fetch_openml('titanic', version=1, as_frame=True, return_X_y=True)

# Define numeric and categorical columns
numeric_features = ['age', 'fare']
categorical_features = ['sex', 'embarked']

# Numeric pipeline: impute missing values and scale
numeric_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

# Categorical pipeline: impute missing and one-hot encode
categorical_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Combine pipelines with FeatureUnion
combined_features = FeatureUnion([
    ('num', numeric_pipeline),
    ('cat', categorical_pipeline)
])

# Fit and transform data
X_num = numeric_pipeline.fit_transform(X[numeric_features])
X_cat = categorical_pipeline.fit_transform(X[categorical_features])
X_combined = combined_features.fit_transform(X)

print('Numeric features shape:', X_num.shape)
print('Categorical features shape:', X_cat.shape)
print('Combined features shape:', X_combined.shape)

Output

Numeric features shape: (891, 2) Categorical features shape: (891, 8) Combined features shape: (891, 10)

⚠️

Common Pitfalls

Common mistakes when using FeatureUnion include:

Passing raw data directly without selecting columns inside transformers, causing errors.
Not fitting each transformer before transforming, leading to errors.
Confusing FeatureUnion with ColumnTransformer which is often better for column-wise transformations.
Ignoring sparse matrix outputs which can cause issues when concatenating.

Usually, ColumnTransformer is preferred for column-based feature processing, but FeatureUnion is useful when combining different feature extraction methods that work on the whole dataset.

python

from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import StandardScaler

# Wrong: transformers expect full data but get only parts
fu = FeatureUnion([
    ('scale', StandardScaler())  # This expects numeric data only
])

# Right: wrap transformers in pipelines that select correct columns or use ColumnTransformer
from sklearn.compose import ColumnTransformer

ct = ColumnTransformer([
    ('scale', StandardScaler(), ['age', 'fare'])
])

📊

Quick Reference

FeatureUnion Cheat Sheet:

Parameter	Description
`transformer_list`	List of (name, transformer) pairs to combine
`n_jobs`	Number of parallel jobs (default 1)
`transformer_weights`	Weights for each transformer output (optional)
`verbose`	Show progress messages (default False)

Use fit and transform methods like other sklearn transformers.

Parameter	Description
transformer_list	List of (name, transformer) pairs to combine
n_jobs	Number of parallel jobs (default 1)
transformer_weights	Weights for each transformer output (optional)
verbose	Show progress messages (default False)

✅

Key Takeaways

FeatureUnion combines multiple transformers by applying them in parallel and concatenating their outputs.

Each transformer in FeatureUnion should be a valid sklearn transformer with fit and transform methods.

Use FeatureUnion when you want to merge different feature extraction methods into one feature set.

For column-wise transformations, consider using ColumnTransformer as it is often simpler and safer.

Always ensure transformers receive the correct input data shape to avoid errors.