0
0
ML Pythonml~5 mins

Bagging concept in ML Python

Choose your learning style9 modes available
Introduction
Bagging helps make predictions more accurate by combining many simple models. It reduces mistakes by averaging their results.
When a single model makes unstable or noisy predictions.
When you want to improve accuracy without changing the model type.
When you have enough data to create multiple training sets.
When you want to reduce overfitting in decision trees.
When you want a simple way to boost model performance.
Syntax
ML Python
from sklearn.ensemble import BaggingClassifier

bagging = BaggingClassifier(estimator=SomeModel(), n_estimators=10, random_state=42)
bagging.fit(X_train, y_train)
predictions = bagging.predict(X_test)
estimator is the model you want to repeat and combine, like a decision tree.
n_estimators is how many models you want to train and combine.
Examples
Using decision trees as the base model repeated 5 times.
ML Python
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=5)
bagging.fit(X_train, y_train)
predictions = bagging.predict(X_test)
Using logistic regression as the base model repeated 10 times.
ML Python
from sklearn.ensemble import BaggingClassifier
from sklearn.linear_model import LogisticRegression

bagging = BaggingClassifier(estimator=LogisticRegression(), n_estimators=10)
bagging.fit(X_train, y_train)
predictions = bagging.predict(X_test)
Sample Model
This program trains 10 decision trees on different random samples of the iris data and combines their predictions. It then prints the accuracy on test data.
ML Python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

# Create bagging model with decision trees
bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10, random_state=1)

# Train model
bagging.fit(X_train, y_train)

# Predict
predictions = bagging.predict(X_test)

# Check accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")
OutputSuccess
Important Notes
Bagging works best with models that have high variance, like decision trees.
Each model trains on a random sample of the data with replacement (bootstrap sampling).
Combining many models helps reduce errors caused by any single model.
Summary
Bagging means training many models on random samples and combining their results.
It helps make predictions more stable and accurate.
It is easy to use and works well with decision trees.