What is Stacking and blending in ML Python?

ML Pythonml~7 mins

Stacking and blending in ML Python

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Stacking and blending help combine many simple models to make one stronger model. This often gives better guesses than any single model alone.

When you want to improve prediction accuracy by using multiple models together.

When you have different types of models that each do well on parts of the problem.

When you want to reduce mistakes by averaging out errors from different models.

When you participate in competitions where small improvements matter.

When you want a more reliable prediction by combining strengths of many models.

Syntax

ML Python

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import StackingClassifier

# Define base models
base_models = [
    ('rf', RandomForestClassifier()),
    ('lr', LogisticRegression(max_iter=1000))
]

# Define stacking model
stacking_model = StackingClassifier(
    estimators=base_models,
    final_estimator=LogisticRegression(max_iter=1000)
)

# Fit model
stacking_model.fit(X_train, y_train)

# Predict
predictions = stacking_model.predict(X_test)

Stacking uses base models to make predictions, then a final model learns from these predictions.

Blending is similar but uses a holdout set to train the final model instead of cross-validation.

Examples

Stacking with two base models and logistic regression as final model.

ML Python

from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

base_models = [
    ('dt', DecisionTreeClassifier(max_depth=3)),
    ('lr', LogisticRegression(max_iter=1000))
]
stacking = StackingClassifier(
    estimators=base_models,
    final_estimator=LogisticRegression(max_iter=1000)
)
stacking.fit(X_train, y_train)
predictions = stacking.predict(X_test)

Blending uses a holdout set to train the final model instead of cross-validation.

ML Python

# Blending example (conceptual)
# Split training data into train and holdout
X_train_main, X_holdout, y_train_main, y_holdout = train_test_split(X_train, y_train, test_size=0.2)

# Train base models on X_train_main
# Predict on X_holdout
# Use predictions on X_holdout as features to train final model
# Predict on test data using base models and final model

Sample Model

This program loads the iris flower data, splits it into training and test sets, trains two base models (random forest and gradient boosting), then stacks them using logistic regression as the final model. It prints the accuracy on the test set.

ML Python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base models
base_models = [
    ('rf', RandomForestClassifier(random_state=42)),
    ('gb', GradientBoostingClassifier(random_state=42))
]

# Define stacking model
stacking_model = StackingClassifier(
    estimators=base_models,
    final_estimator=LogisticRegression(max_iter=1000),
    cv=5
)

# Train stacking model
stacking_model.fit(X_train, y_train)

# Predict
y_pred = stacking_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Stacking model accuracy: {accuracy:.2f}")

OutputSuccess

Important Notes

Stacking usually improves accuracy but can be slower to train because it trains multiple models.

Blending is simpler but may waste some training data for the holdout set.

Common mistake: Not using cross-validation or holdout properly can cause overfitting in stacking/blending.

Summary

Stacking and blending combine multiple models to make better predictions.

Stacking uses cross-validation to train a final model on base model predictions.

Blending uses a holdout set instead of cross-validation for the final model training.

Practice

(1/5)

1. What is the main goal of stacking and blending in machine learning?

easy

A. To combine multiple models to improve prediction accuracy

B. To reduce the size of the dataset

C. To speed up training by using fewer models

D. To replace all base models with a single model

Stacking and blending in ML Python

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of stacking and blending

Step 2: Identify the goal of combining models

Final Answer:

Quick Check:

Solution

Step 1: Recall stacking training method

Step 2: Compare options to stacking method

Final Answer:

Quick Check:

Solution

Step 1: Calculate holdout set size

Step 2: Determine shape of base model predictions

Final Answer:

Quick Check:

Solution

Step 1: Understand cross_val_predict output

Step 2: Identify cause of inconsistent sample sizes

Final Answer:

Quick Check:

Solution

Step 1: Understand blending process

Step 2: Evaluate options against blending steps

Final Answer:

Quick Check: