0
0
MlopsHow-ToBeginner · 3 min read

How to Use Stacking Classifier in sklearn with Python

Use StackingClassifier from sklearn.ensemble to combine several base models and a final estimator that learns from their outputs. Define base models in a list of tuples, set a final estimator, then fit and predict like any sklearn model.
📐

Syntax

The StackingClassifier is initialized with a list of base estimators and a final estimator. You fit it on training data and use it to predict new data.

  • estimators: List of tuples with a name and a model (e.g., [('lr', LogisticRegression()), ('rf', RandomForestClassifier())])
  • final_estimator: The model that learns from base models' outputs (default is LogisticRegression)
  • fit(X, y): Train the stacking model on features X and labels y
  • predict(X): Predict labels for new data X
python
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

estimators = [
    ('dt', DecisionTreeClassifier()),
    ('svm', SVC(probability=True))
]

stacking_clf = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression()
)

stacking_clf.fit(X_train, y_train)
predictions = stacking_clf.predict(X_test)
💻

Example

This example shows how to use StackingClassifier with three base models and a logistic regression as the final estimator on the Iris dataset. It trains the model and prints the accuracy on test data.

python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base models
estimators = [
    ('dt', DecisionTreeClassifier(random_state=42)),
    ('svm', SVC(probability=True, random_state=42))
]

# Create stacking classifier
stacking_clf = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression()
)

# Train
stacking_clf.fit(X_train, y_train)

# Predict
y_pred = stacking_clf.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Stacking Classifier Accuracy: {accuracy:.2f}")
Output
Stacking Classifier Accuracy: 0.98
⚠️

Common Pitfalls

  • Not setting probability=True for base models like SVC: The stacking classifier needs probabilities from base models, so forgetting this causes errors.
  • Using incompatible models: Base models must support fit and predict methods.
  • Ignoring data splits: Always split data into train and test to evaluate stacking properly.
  • Overfitting final estimator: Use cross-validation or tune hyperparameters to avoid overfitting.
python
from sklearn.svm import SVC

# Wrong: SVC without probability=True
wrong_svc = SVC()

# This will cause an error when used in StackingClassifier

# Right way:
right_svc = SVC(probability=True)
📊

Quick Reference

Remember these key points when using StackingClassifier:

  • Base models list: estimators=[('name', model), ...]
  • Final estimator: model that learns from base models' outputs
  • Set probability=True for base models that need it (e.g., SVC)
  • Use fit and predict like other sklearn models
  • Evaluate with train/test split or cross-validation

Key Takeaways

Use StackingClassifier to combine multiple models for improved prediction.
Always set probability=True for base models that require it, like SVC.
Define base estimators as a list of (name, model) tuples.
Fit the stacking model on training data and predict on new data.
Evaluate performance with proper train/test splits to avoid overfitting.