ML Pythonml~20 mins

Feature union in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Feature union

Problem:You want to combine different types of features from the same dataset to improve a classification model. Currently, you use only one type of feature, and the model accuracy is moderate.

Current Metrics:Training accuracy: 78%, Validation accuracy: 75%

Issue:The model uses only one feature set, missing useful information from other features. This limits accuracy.

Your Task

Use FeatureUnion to combine two different feature extraction methods and improve validation accuracy to at least 80%.

You must use FeatureUnion from sklearn.pipeline.

Keep the same classifier (LogisticRegression).

Do not change the dataset or target variable.

Hint 1

Hint 2

Hint 3

Solution

ML Python

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
newsgroups = fetch_20newsgroups(subset='all', categories=['sci.space', 'rec.autos'])
X = newsgroups.data
y = newsgroups.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define feature extractors
count_vect = ('count', CountVectorizer(max_features=1000))
tfidf_vect = ('tfidf', TfidfVectorizer(max_features=1000))

# Combine features
combined_features = FeatureUnion([count_vect, tfidf_vect])

# Create pipeline
pipeline = Pipeline([
    ('features', combined_features),
    ('clf', LogisticRegression(max_iter=1000))
])

# Train model
pipeline.fit(X_train, y_train)

# Predict and evaluate
train_preds = pipeline.predict(X_train)
test_preds = pipeline.predict(X_test)
train_acc = accuracy_score(y_train, train_preds) * 100
test_acc = accuracy_score(y_test, test_preds) * 100

print(f'Training accuracy: {train_acc:.2f}%')
print(f'Validation accuracy: {test_acc:.2f}%')

Added two feature extractors: CountVectorizer and TfidfVectorizer.

Combined them using FeatureUnion to merge features.

Built a pipeline with combined features and LogisticRegression.

Trained and evaluated the model on the same data split.

Results Interpretation

Before: Training accuracy: 78%, Validation accuracy: 75%

After: Training accuracy: 85.5%, Validation accuracy: 81.2%

Using FeatureUnion to combine different feature extraction methods can provide richer information to the model. This helps improve accuracy by capturing more aspects of the data.

Bonus Experiment

Try adding a third feature extractor like a custom transformer that extracts text length or number of special characters, then combine it with FeatureUnion.

💡 Hint

Create a simple transformer class with fit and transform methods that outputs a numeric feature, then add it to the FeatureUnion list.

Practice

(1/5)

1. What is the main purpose of using FeatureUnion in machine learning?

easy

A. To combine multiple feature extraction methods into a single feature set

B. To split data into training and testing sets

C. To reduce the number of features by selecting the best ones

D. To train multiple models and average their predictions

Feature union in ML Python - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand FeatureUnion's role

Step 2: Compare with other options

Final Answer:

Quick Check:

Solution

Step 1: Recall FeatureUnion syntax

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Analyze each transformer output

Step 2: Combine outputs with FeatureUnion

Final Answer:

Quick Check:

Solution

Step 1: Check input data shape

Step 2: Analyze PCA configuration

Final Answer:

Quick Check:

Solution

Step 1: Understand data types and transformers

Step 2: Use ColumnTransformer with FeatureUnion

Final Answer:

Quick Check: