How to use voting classifier sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use Voting Classifier in sklearn with Python

Use VotingClassifier from sklearn.ensemble to combine multiple models by specifying them as estimators and choosing a voting method like 'hard' or 'soft'. Fit the voting classifier on your training data and use it to predict or evaluate performance.

📐

Syntax

The VotingClassifier combines several base models to improve prediction accuracy. You provide a list of (name, model) pairs as estimators. The voting parameter controls how predictions are combined: 'hard' uses majority voting, and 'soft' averages predicted probabilities.

estimators: List of tuples with model names and instances.
voting: 'hard' or 'soft' (default is 'hard').
weights: Optional list to give different importance to models.

python

from sklearn.ensemble import VotingClassifier

voting_clf = VotingClassifier(
    estimators=[('model1', model1), ('model2', model2), ('model3', model3)],
    voting='hard',
    weights=None
)

💻

Example

This example shows how to create a voting classifier combining logistic regression, decision tree, and k-nearest neighbors classifiers. It fits the combined model on the Iris dataset and prints the accuracy.

python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base models
log_clf = LogisticRegression(max_iter=200)
dt_clf = DecisionTreeClassifier()
knn_clf = KNeighborsClassifier()

# Create voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('dt', dt_clf), ('knn', knn_clf)],
    voting='hard'
)

# Train voting classifier
voting_clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = voting_clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Voting Classifier Accuracy: {accuracy:.2f}")

Output

Voting Classifier Accuracy: 0.98

⚠️

Common Pitfalls

Using voting='soft' requires all base models to support predict_proba. Otherwise, it will raise an error.
Not fitting the base models separately is fine; VotingClassifier fits them internally.
For classification tasks only; it does not work for regression.
Weights must match the number of estimators if provided.

python

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

# Incorrect: soft voting with a model that lacks predict_proba
log_clf = LogisticRegression(max_iter=200)
dt_clf = DecisionTreeClassifier()  # DecisionTreeClassifier supports predict_proba

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('dt', dt_clf)],
    voting='soft'
)

# This will work because both support predict_proba
voting_clf.fit([[0,0],[1,1],[2,2],[3,3]], [0,1,1,0])

# Wrong example: if a model does not support predict_proba, soft voting fails
# For example, a custom model without predict_proba would cause error

📊

Quick Reference

Summary tips for using VotingClassifier:

Use voting='hard' for majority vote, voting='soft' to average probabilities.
Ensure all models support predict_proba if using soft voting.
Set weights to give more influence to stronger models.
Works only for classification problems.

✅

Key Takeaways

Use VotingClassifier to combine multiple models for better classification results.

Choose 'hard' voting for majority class or 'soft' voting to average probabilities.

All models must support predict_proba when using soft voting.

Weights can adjust the influence of each model in the voting process.

VotingClassifier fits all base models internally; no need to fit them separately.