How to Use AdaBoost Classifier in Python with sklearn
Use
AdaBoostClassifier from sklearn.ensemble by creating an instance, fitting it to your training data with fit(), and predicting with predict(). It combines weak learners to build a strong classifier for better accuracy.Syntax
The basic syntax to use AdaBoost classifier is:
AdaBoostClassifier(estimator=None, n_estimators=50, learning_rate=1.0, random_state=None): Creates the AdaBoost model.fit(X_train, y_train): Trains the model on your data.predict(X_test): Predicts labels for new data.
Parameters explained:
estimator: The weak learner to boost, default is a decision stump.n_estimators: Number of weak learners to combine.learning_rate: Weight applied to each weak learner.random_state: Controls randomness for reproducibility.
python
from sklearn.ensemble import AdaBoostClassifier model = AdaBoostClassifier( estimator=None, # default decision tree stump n_estimators=50, # number of weak learners learning_rate=1.0, # contribution of each learner random_state=42 # for reproducible results ) model.fit(X_train, y_train) # train model predictions = model.predict(X_test) # predict labels
Example
This example shows how to train and test AdaBoost on the Iris dataset, a simple flower classification task.
python
from sklearn.datasets import load_iris from sklearn.ensemble import AdaBoostClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load data iris = load_iris() X, y = iris.data, iris.target # Split data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Create AdaBoost model model = AdaBoostClassifier(n_estimators=100, learning_rate=0.5, random_state=42) # Train model model.fit(X_train, y_train) # Predict on test data predictions = model.predict(X_test) # Calculate accuracy accuracy = accuracy_score(y_test, predictions) print(f"Accuracy: {accuracy:.2f}")
Output
Accuracy: 0.98
Common Pitfalls
Common mistakes when using AdaBoost include:
- Not scaling features when using base estimators sensitive to feature scale.
- Setting
n_estimatorstoo high, which can cause overfitting. - Ignoring
random_state, making results non-reproducible. - Using incompatible base estimators that do not support sample weighting.
Always check your base estimator supports sample weights, as AdaBoost relies on them to focus on hard examples.
python
from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import AdaBoostClassifier # Wrong: base estimator that does not support sample_weight # This will raise an error or give poor results # base_estimator = SomeEstimatorWithoutSampleWeightSupport() # Right: use DecisionTreeClassifier with max_depth=1 (decision stump) base_estimator = DecisionTreeClassifier(max_depth=1) model = AdaBoostClassifier(estimator=base_estimator, n_estimators=50, random_state=42) # Then fit and predict as usual
Quick Reference
Tips for using AdaBoostClassifier:
- Default base estimator is a decision stump (depth=1 tree).
- Use
n_estimatorsto control model complexity. - Adjust
learning_rateto balance contribution of each learner. - Set
random_statefor reproducible experiments. - Works well on small to medium datasets with noisy labels.
Key Takeaways
Use AdaBoostClassifier from sklearn.ensemble to combine weak learners into a strong classifier.
Fit the model with fit() and predict new data with predict().
Choose a base estimator that supports sample weights, like a decision stump.
Tune n_estimators and learning_rate to avoid overfitting or underfitting.
Set random_state for consistent results across runs.