How to Use Stacking Classifier in sklearn with Python
Use
StackingClassifier from sklearn.ensemble to combine several base models and a final estimator that learns from their outputs. Define base models in a list of tuples, set a final estimator, then fit and predict like any sklearn model.Syntax
The StackingClassifier is initialized with a list of base estimators and a final estimator. You fit it on training data and use it to predict new data.
- estimators: List of tuples with a name and a model (e.g., [('lr', LogisticRegression()), ('rf', RandomForestClassifier())])
- final_estimator: The model that learns from base models' outputs (default is LogisticRegression)
- fit(X, y): Train the stacking model on features
Xand labelsy - predict(X): Predict labels for new data
X
python
from sklearn.ensemble import StackingClassifier from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC estimators = [ ('dt', DecisionTreeClassifier()), ('svm', SVC(probability=True)) ] stacking_clf = StackingClassifier( estimators=estimators, final_estimator=LogisticRegression() ) stacking_clf.fit(X_train, y_train) predictions = stacking_clf.predict(X_test)
Example
This example shows how to use StackingClassifier with three base models and a logistic regression as the final estimator on the Iris dataset. It trains the model and prints the accuracy on test data.
python
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.ensemble import StackingClassifier from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC from sklearn.metrics import accuracy_score # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Define base models estimators = [ ('dt', DecisionTreeClassifier(random_state=42)), ('svm', SVC(probability=True, random_state=42)) ] # Create stacking classifier stacking_clf = StackingClassifier( estimators=estimators, final_estimator=LogisticRegression() ) # Train stacking_clf.fit(X_train, y_train) # Predict y_pred = stacking_clf.predict(X_test) # Accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Stacking Classifier Accuracy: {accuracy:.2f}")
Output
Stacking Classifier Accuracy: 0.98
Common Pitfalls
- Not setting
probability=Truefor base models like SVC: The stacking classifier needs probabilities from base models, so forgetting this causes errors. - Using incompatible models: Base models must support
fitandpredictmethods. - Ignoring data splits: Always split data into train and test to evaluate stacking properly.
- Overfitting final estimator: Use cross-validation or tune hyperparameters to avoid overfitting.
python
from sklearn.svm import SVC # Wrong: SVC without probability=True wrong_svc = SVC() # This will cause an error when used in StackingClassifier # Right way: right_svc = SVC(probability=True)
Quick Reference
Remember these key points when using StackingClassifier:
- Base models list:
estimators=[('name', model), ...] - Final estimator: model that learns from base models' outputs
- Set
probability=Truefor base models that need it (e.g., SVC) - Use
fitandpredictlike other sklearn models - Evaluate with train/test split or cross-validation
Key Takeaways
Use StackingClassifier to combine multiple models for improved prediction.
Always set probability=True for base models that require it, like SVC.
Define base estimators as a list of (name, model) tuples.
Fit the stacking model on training data and predict on new data.
Evaluate performance with proper train/test splits to avoid overfitting.