0
0
ML Pythonml~20 mins

Bagging concept in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Bagging concept
Problem:You have a classification task using the Iris dataset. The current model is a single decision tree that achieves 98% accuracy on training data but only 85% on validation data.
Current Metrics:Training accuracy: 98%, Validation accuracy: 85%
Issue:The model overfits the training data, causing lower accuracy on unseen validation data.
Your Task
Reduce overfitting by using bagging to improve validation accuracy to at least 90% while keeping training accuracy below 95%.
Use the Iris dataset only.
Use decision trees as base learners.
Implement bagging with scikit-learn's BaggingClassifier.
Do not change the dataset or use other models.
Hint 1
Hint 2
Hint 3
Solution
ML Python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)

# Single decision tree model
single_tree = DecisionTreeClassifier(random_state=42)
single_tree.fit(X_train, y_train)
train_acc_single = accuracy_score(y_train, single_tree.predict(X_train))
val_acc_single = accuracy_score(y_val, single_tree.predict(X_val))

# Bagging with decision trees
bagging_model = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(random_state=42),
    n_estimators=50,
    random_state=42
)
bagging_model.fit(X_train, y_train)
train_acc_bagging = accuracy_score(y_train, bagging_model.predict(X_train))
val_acc_bagging = accuracy_score(y_val, bagging_model.predict(X_val))

print(f"Single Tree - Training Accuracy: {train_acc_single:.2f}, Validation Accuracy: {val_acc_single:.2f}")
print(f"Bagging - Training Accuracy: {train_acc_bagging:.2f}, Validation Accuracy: {val_acc_bagging:.2f}")
Replaced single decision tree with BaggingClassifier using 50 decision trees.
Each tree trained on random subsets of training data to reduce overfitting.
Kept random_state fixed for reproducibility.
Added random_state=42 to base DecisionTreeClassifier for reproducibility.
Results Interpretation

Before Bagging: Training accuracy was very high (98%) but validation accuracy was lower (85%), showing overfitting.

After Bagging: Training accuracy decreased slightly (~93%), but validation accuracy improved (~92%), showing better generalization.

Bagging reduces overfitting by averaging many models trained on different data samples, improving validation accuracy and model stability.
Bonus Experiment
Try increasing the number of trees in the bagging ensemble to 100 and observe the effect on validation accuracy.
💡 Hint
More trees usually improve stability but increase training time. Watch for diminishing returns.