How to use stacking classifier sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use Stacking Classifier in sklearn with Python

Use StackingClassifier from sklearn.ensemble to combine several base models and a final estimator that learns from their outputs. Define base models in a list of tuples, set a final estimator, then fit and predict like any sklearn model.

📐

Syntax

The StackingClassifier is initialized with a list of base estimators and a final estimator. You fit it on training data and use it to predict new data.

estimators: List of tuples with a name and a model (e.g., [('lr', LogisticRegression()), ('rf', RandomForestClassifier())])
final_estimator: The model that learns from base models' outputs (default is LogisticRegression)
fit(X, y): Train the stacking model on features X and labels y
predict(X): Predict labels for new data X

python

from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

estimators = [
    ('dt', DecisionTreeClassifier()),
    ('svm', SVC(probability=True))
]

stacking_clf = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression()
)

stacking_clf.fit(X_train, y_train)
predictions = stacking_clf.predict(X_test)

💻

Example

This example shows how to use StackingClassifier with three base models and a logistic regression as the final estimator on the Iris dataset. It trains the model and prints the accuracy on test data.

python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base models
estimators = [
    ('dt', DecisionTreeClassifier(random_state=42)),
    ('svm', SVC(probability=True, random_state=42))
]

# Create stacking classifier
stacking_clf = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression()
)

# Train
stacking_clf.fit(X_train, y_train)

# Predict
y_pred = stacking_clf.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Stacking Classifier Accuracy: {accuracy:.2f}")

Output

Stacking Classifier Accuracy: 0.98

⚠️

Common Pitfalls

Not setting probability=True for base models like SVC: The stacking classifier needs probabilities from base models, so forgetting this causes errors.
Using incompatible models: Base models must support fit and predict methods.
Ignoring data splits: Always split data into train and test to evaluate stacking properly.
Overfitting final estimator: Use cross-validation or tune hyperparameters to avoid overfitting.

python

from sklearn.svm import SVC

# Wrong: SVC without probability=True
wrong_svc = SVC()

# This will cause an error when used in StackingClassifier

# Right way:
right_svc = SVC(probability=True)

📊

Quick Reference

Remember these key points when using StackingClassifier:

Base models list: estimators=[('name', model), ...]
Final estimator: model that learns from base models' outputs
Set probability=True for base models that need it (e.g., SVC)
Use fit and predict like other sklearn models
Evaluate with train/test split or cross-validation

✅

Key Takeaways

Use StackingClassifier to combine multiple models for improved prediction.

Always set probability=True for base models that require it, like SVC.

Define base estimators as a list of (name, model) tuples.

Fit the stacking model on training data and predict on new data.

Evaluate performance with proper train/test splits to avoid overfitting.