0
0
MlopsHow-ToBeginner · 3 min read

How to Use AdaBoost Classifier in Python with sklearn

Use AdaBoostClassifier from sklearn.ensemble by creating an instance, fitting it to your training data with fit(), and predicting with predict(). It combines weak learners to build a strong classifier for better accuracy.
📐

Syntax

The basic syntax to use AdaBoost classifier is:

  • AdaBoostClassifier(estimator=None, n_estimators=50, learning_rate=1.0, random_state=None): Creates the AdaBoost model.
  • fit(X_train, y_train): Trains the model on your data.
  • predict(X_test): Predicts labels for new data.

Parameters explained:

  • estimator: The weak learner to boost, default is a decision stump.
  • n_estimators: Number of weak learners to combine.
  • learning_rate: Weight applied to each weak learner.
  • random_state: Controls randomness for reproducibility.
python
from sklearn.ensemble import AdaBoostClassifier

model = AdaBoostClassifier(
    estimator=None,  # default decision tree stump
    n_estimators=50,      # number of weak learners
    learning_rate=1.0,    # contribution of each learner
    random_state=42       # for reproducible results
)

model.fit(X_train, y_train)  # train model
predictions = model.predict(X_test)  # predict labels
💻

Example

This example shows how to train and test AdaBoost on the Iris dataset, a simple flower classification task.

python
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create AdaBoost model
model = AdaBoostClassifier(n_estimators=100, learning_rate=0.5, random_state=42)

# Train model
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")
Output
Accuracy: 0.98
⚠️

Common Pitfalls

Common mistakes when using AdaBoost include:

  • Not scaling features when using base estimators sensitive to feature scale.
  • Setting n_estimators too high, which can cause overfitting.
  • Ignoring random_state, making results non-reproducible.
  • Using incompatible base estimators that do not support sample weighting.

Always check your base estimator supports sample weights, as AdaBoost relies on them to focus on hard examples.

python
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

# Wrong: base estimator that does not support sample_weight
# This will raise an error or give poor results
# base_estimator = SomeEstimatorWithoutSampleWeightSupport()

# Right: use DecisionTreeClassifier with max_depth=1 (decision stump)
base_estimator = DecisionTreeClassifier(max_depth=1)
model = AdaBoostClassifier(estimator=base_estimator, n_estimators=50, random_state=42)

# Then fit and predict as usual
📊

Quick Reference

Tips for using AdaBoostClassifier:

  • Default base estimator is a decision stump (depth=1 tree).
  • Use n_estimators to control model complexity.
  • Adjust learning_rate to balance contribution of each learner.
  • Set random_state for reproducible experiments.
  • Works well on small to medium datasets with noisy labels.

Key Takeaways

Use AdaBoostClassifier from sklearn.ensemble to combine weak learners into a strong classifier.
Fit the model with fit() and predict new data with predict().
Choose a base estimator that supports sample weights, like a decision stump.
Tune n_estimators and learning_rate to avoid overfitting or underfitting.
Set random_state for consistent results across runs.