How to Use FeatureUnion in sklearn with Python: Simple Guide
Use
FeatureUnion in sklearn to combine multiple feature extraction pipelines into one. It applies each transformer in parallel and concatenates their outputs, allowing you to merge different feature sets before feeding them to a model.Syntax
The basic syntax of FeatureUnion is:
FeatureUnion(transformer_list, n_jobs=None, transformer_weights=None, verbose=False)
Here:
transformer_listis a list of tuples with a name and a transformer (like pipelines or feature extractors).n_jobscontrols parallel processing (None means 1 job).transformer_weightslets you assign weights to each transformer’s output.verboseshows progress if set to True.
python
from sklearn.pipeline import FeatureUnion feature_union = FeatureUnion([ ('transformer1', transformer1), ('transformer2', transformer2) ], n_jobs=1, transformer_weights=None, verbose=False)
Example
This example shows how to combine two simple feature extractors: one that selects numeric columns and one that selects categorical columns, then applies different transformations to each. The outputs are joined into one feature set.
python
from sklearn.pipeline import FeatureUnion, Pipeline from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.datasets import fetch_openml from sklearn.impute import SimpleImputer import numpy as np # Load sample data X, y = fetch_openml('titanic', version=1, as_frame=True, return_X_y=True) # Define numeric and categorical columns numeric_features = ['age', 'fare'] categorical_features = ['sex', 'embarked'] # Numeric pipeline: impute missing values and scale numeric_pipeline = Pipeline([ ('imputer', SimpleImputer(strategy='mean')), ('scaler', StandardScaler()) ]) # Categorical pipeline: impute missing and one-hot encode categorical_pipeline = Pipeline([ ('imputer', SimpleImputer(strategy='most_frequent')), ('onehot', OneHotEncoder(handle_unknown='ignore')) ]) # Combine pipelines with FeatureUnion combined_features = FeatureUnion([ ('num', numeric_pipeline), ('cat', categorical_pipeline) ]) # Fit and transform data X_num = numeric_pipeline.fit_transform(X[numeric_features]) X_cat = categorical_pipeline.fit_transform(X[categorical_features]) X_combined = combined_features.fit_transform(X) print('Numeric features shape:', X_num.shape) print('Categorical features shape:', X_cat.shape) print('Combined features shape:', X_combined.shape)
Output
Numeric features shape: (891, 2)
Categorical features shape: (891, 8)
Combined features shape: (891, 10)
Common Pitfalls
Common mistakes when using FeatureUnion include:
- Passing raw data directly without selecting columns inside transformers, causing errors.
- Not fitting each transformer before transforming, leading to errors.
- Confusing
FeatureUnionwithColumnTransformerwhich is often better for column-wise transformations. - Ignoring sparse matrix outputs which can cause issues when concatenating.
Usually, ColumnTransformer is preferred for column-based feature processing, but FeatureUnion is useful when combining different feature extraction methods that work on the whole dataset.
python
from sklearn.pipeline import FeatureUnion from sklearn.preprocessing import StandardScaler # Wrong: transformers expect full data but get only parts fu = FeatureUnion([ ('scale', StandardScaler()) # This expects numeric data only ]) # Right: wrap transformers in pipelines that select correct columns or use ColumnTransformer from sklearn.compose import ColumnTransformer ct = ColumnTransformer([ ('scale', StandardScaler(), ['age', 'fare']) ])
Quick Reference
FeatureUnion Cheat Sheet:
| Parameter | Description |
|---|---|
transformer_list | List of (name, transformer) pairs to combine |
n_jobs | Number of parallel jobs (default 1) |
transformer_weights | Weights for each transformer output (optional) |
verbose | Show progress messages (default False) |
Use fit and transform methods like other sklearn transformers.
| Parameter | Description |
|---|---|
| transformer_list | List of (name, transformer) pairs to combine |
| n_jobs | Number of parallel jobs (default 1) |
| transformer_weights | Weights for each transformer output (optional) |
| verbose | Show progress messages (default False) |
Key Takeaways
FeatureUnion combines multiple transformers by applying them in parallel and concatenating their outputs.
Each transformer in FeatureUnion should be a valid sklearn transformer with fit and transform methods.
Use FeatureUnion when you want to merge different feature extraction methods into one feature set.
For column-wise transformations, consider using ColumnTransformer as it is often simpler and safer.
Always ensure transformers receive the correct input data shape to avoid errors.