ML Pythonml~20 mins

Bagging concept in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Bagging concept

Problem:You have a classification task using the Iris dataset. The current model is a single decision tree that achieves 98% accuracy on training data but only 85% on validation data.

Current Metrics:Training accuracy: 98%, Validation accuracy: 85%

Issue:The model overfits the training data, causing lower accuracy on unseen validation data.

Your Task

Reduce overfitting by using bagging to improve validation accuracy to at least 90% while keeping training accuracy below 95%.

Use the Iris dataset only.

Use decision trees as base learners.

Implement bagging with scikit-learn's BaggingClassifier.

Do not change the dataset or use other models.

Hint 1

Hint 2

Hint 3

Solution

ML Python

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)

# Single decision tree model
single_tree = DecisionTreeClassifier(random_state=42)
single_tree.fit(X_train, y_train)
train_acc_single = accuracy_score(y_train, single_tree.predict(X_train))
val_acc_single = accuracy_score(y_val, single_tree.predict(X_val))

# Bagging with decision trees
bagging_model = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(random_state=42),
    n_estimators=50,
    random_state=42
)
bagging_model.fit(X_train, y_train)
train_acc_bagging = accuracy_score(y_train, bagging_model.predict(X_train))
val_acc_bagging = accuracy_score(y_val, bagging_model.predict(X_val))

print(f"Single Tree - Training Accuracy: {train_acc_single:.2f}, Validation Accuracy: {val_acc_single:.2f}")
print(f"Bagging - Training Accuracy: {train_acc_bagging:.2f}, Validation Accuracy: {val_acc_bagging:.2f}")

Replaced single decision tree with BaggingClassifier using 50 decision trees.

Each tree trained on random subsets of training data to reduce overfitting.

Kept random_state fixed for reproducibility.

Added random_state=42 to base DecisionTreeClassifier for reproducibility.

Results Interpretation

Before Bagging: Training accuracy was very high (98%) but validation accuracy was lower (85%), showing overfitting.

After Bagging: Training accuracy decreased slightly (~93%), but validation accuracy improved (~92%), showing better generalization.

Bagging reduces overfitting by averaging many models trained on different data samples, improving validation accuracy and model stability.

Bonus Experiment

Try increasing the number of trees in the bagging ensemble to 100 and observe the effect on validation accuracy.

💡 Hint

More trees usually improve stability but increase training time. Watch for diminishing returns.

Practice

(1/5)

1. What is the main idea behind bagging in machine learning?

easy

A. Training multiple models on random samples and combining their results

B. Using a single model with all data to avoid randomness

C. Reducing the number of features to simplify the model

D. Increasing the depth of a decision tree to improve accuracy

Bagging concept in ML Python - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand bagging concept

Step 2: Identify the purpose of bagging

Final Answer:

Quick Check:

Solution

Step 1: Recall scikit-learn bagging syntax

Step 2: Match parameters to options

Final Answer:

Quick Check:

Solution

Step 1: Understand the code output

Step 2: Interpret the printed value meaning

Final Answer:

Quick Check:

Solution

Step 1: Check parameter types

Step 2: Identify error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand bagging effect on overfitting

Step 2: Choose model depth and sampling

Final Answer:

Quick Check: