How to Use Voting Classifier in sklearn with Python
Use
VotingClassifier from sklearn.ensemble to combine multiple models by specifying them as estimators and choosing a voting method like 'hard' or 'soft'. Fit the voting classifier on your training data and use it to predict or evaluate performance.Syntax
The VotingClassifier combines several base models to improve prediction accuracy. You provide a list of (name, model) pairs as estimators. The voting parameter controls how predictions are combined: 'hard' uses majority voting, and 'soft' averages predicted probabilities.
estimators: List of tuples with model names and instances.voting: 'hard' or 'soft' (default is 'hard').weights: Optional list to give different importance to models.
python
from sklearn.ensemble import VotingClassifier voting_clf = VotingClassifier( estimators=[('model1', model1), ('model2', model2), ('model3', model3)], voting='hard', weights=None )
Example
This example shows how to create a voting classifier combining logistic regression, decision tree, and k-nearest neighbors classifiers. It fits the combined model on the Iris dataset and prints the accuracy.
python
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.ensemble import VotingClassifier from sklearn.metrics import accuracy_score # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Define base models log_clf = LogisticRegression(max_iter=200) dt_clf = DecisionTreeClassifier() knn_clf = KNeighborsClassifier() # Create voting classifier voting_clf = VotingClassifier( estimators=[('lr', log_clf), ('dt', dt_clf), ('knn', knn_clf)], voting='hard' ) # Train voting classifier voting_clf.fit(X_train, y_train) # Predict and evaluate y_pred = voting_clf.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f"Voting Classifier Accuracy: {accuracy:.2f}")
Output
Voting Classifier Accuracy: 0.98
Common Pitfalls
- Using
voting='soft'requires all base models to supportpredict_proba. Otherwise, it will raise an error. - Not fitting the base models separately is fine;
VotingClassifierfits them internally. - For classification tasks only; it does not work for regression.
- Weights must match the number of estimators if provided.
python
from sklearn.ensemble import VotingClassifier from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier # Incorrect: soft voting with a model that lacks predict_proba log_clf = LogisticRegression(max_iter=200) dt_clf = DecisionTreeClassifier() # DecisionTreeClassifier supports predict_proba voting_clf = VotingClassifier( estimators=[('lr', log_clf), ('dt', dt_clf)], voting='soft' ) # This will work because both support predict_proba voting_clf.fit([[0,0],[1,1],[2,2],[3,3]], [0,1,1,0]) # Wrong example: if a model does not support predict_proba, soft voting fails # For example, a custom model without predict_proba would cause error
Quick Reference
Summary tips for using VotingClassifier:
- Use
voting='hard'for majority vote,voting='soft'to average probabilities. - Ensure all models support
predict_probaif using soft voting. - Set
weightsto give more influence to stronger models. - Works only for classification problems.
Key Takeaways
Use VotingClassifier to combine multiple models for better classification results.
Choose 'hard' voting for majority class or 'soft' voting to average probabilities.
All models must support predict_proba when using soft voting.
Weights can adjust the influence of each model in the voting process.
VotingClassifier fits all base models internally; no need to fit them separately.