0
0
MlopsHow-ToBeginner · 3 min read

How to Use Voting Classifier in sklearn with Python

Use VotingClassifier from sklearn.ensemble to combine multiple models by specifying them as estimators and choosing a voting method like 'hard' or 'soft'. Fit the voting classifier on your training data and use it to predict or evaluate performance.
📐

Syntax

The VotingClassifier combines several base models to improve prediction accuracy. You provide a list of (name, model) pairs as estimators. The voting parameter controls how predictions are combined: 'hard' uses majority voting, and 'soft' averages predicted probabilities.

  • estimators: List of tuples with model names and instances.
  • voting: 'hard' or 'soft' (default is 'hard').
  • weights: Optional list to give different importance to models.
python
from sklearn.ensemble import VotingClassifier

voting_clf = VotingClassifier(
    estimators=[('model1', model1), ('model2', model2), ('model3', model3)],
    voting='hard',
    weights=None
)
💻

Example

This example shows how to create a voting classifier combining logistic regression, decision tree, and k-nearest neighbors classifiers. It fits the combined model on the Iris dataset and prints the accuracy.

python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base models
log_clf = LogisticRegression(max_iter=200)
dt_clf = DecisionTreeClassifier()
knn_clf = KNeighborsClassifier()

# Create voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('dt', dt_clf), ('knn', knn_clf)],
    voting='hard'
)

# Train voting classifier
voting_clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = voting_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Voting Classifier Accuracy: {accuracy:.2f}")
Output
Voting Classifier Accuracy: 0.98
⚠️

Common Pitfalls

  • Using voting='soft' requires all base models to support predict_proba. Otherwise, it will raise an error.
  • Not fitting the base models separately is fine; VotingClassifier fits them internally.
  • For classification tasks only; it does not work for regression.
  • Weights must match the number of estimators if provided.
python
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

# Incorrect: soft voting with a model that lacks predict_proba
log_clf = LogisticRegression(max_iter=200)
dt_clf = DecisionTreeClassifier()  # DecisionTreeClassifier supports predict_proba

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('dt', dt_clf)],
    voting='soft'
)

# This will work because both support predict_proba
voting_clf.fit([[0,0],[1,1],[2,2],[3,3]], [0,1,1,0])

# Wrong example: if a model does not support predict_proba, soft voting fails
# For example, a custom model without predict_proba would cause error
📊

Quick Reference

Summary tips for using VotingClassifier:

  • Use voting='hard' for majority vote, voting='soft' to average probabilities.
  • Ensure all models support predict_proba if using soft voting.
  • Set weights to give more influence to stronger models.
  • Works only for classification problems.

Key Takeaways

Use VotingClassifier to combine multiple models for better classification results.
Choose 'hard' voting for majority class or 'soft' voting to average probabilities.
All models must support predict_proba when using soft voting.
Weights can adjust the influence of each model in the voting process.
VotingClassifier fits all base models internally; no need to fit them separately.