ML Pythonml~20 mins

Why ensembles outperform single models in ML Python - Experiment to Prove It

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Why ensembles outperform single models

Problem:We want to improve prediction accuracy on a classification task using a dataset. Currently, a single decision tree model is used.

Current Metrics:Training accuracy: 95%, Validation accuracy: 80%

Issue:The single decision tree model overfits the training data, causing lower accuracy on validation data.

Your Task

Increase validation accuracy to above 85% by using ensemble methods while keeping training accuracy below 93%.

Use only ensemble methods based on decision trees (e.g., Random Forest or Gradient Boosting).

Do not change the dataset or perform additional feature engineering.

Hint 1

Hint 2

Hint 3

Solution

ML Python

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load data
X, y = load_breast_cancer(return_X_y=True)

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Single decision tree model
single_tree = DecisionTreeClassifier(random_state=42)
single_tree.fit(X_train, y_train)
train_acc_tree = accuracy_score(y_train, single_tree.predict(X_train))
val_acc_tree = accuracy_score(y_val, single_tree.predict(X_val))

# Ensemble model: Random Forest
rf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
rf.fit(X_train, y_train)
train_acc_rf = accuracy_score(y_train, rf.predict(X_train))
val_acc_rf = accuracy_score(y_val, rf.predict(X_val))

print(f"Single Tree - Train Accuracy: {train_acc_tree:.2f}, Validation Accuracy: {val_acc_tree:.2f}")
print(f"Random Forest - Train Accuracy: {train_acc_rf:.2f}, Validation Accuracy: {val_acc_rf:.2f}")

Replaced single decision tree with Random Forest ensemble.

Set number of trees to 100 to average predictions and reduce variance.

Limited tree depth to 5 to reduce overfitting.

Results Interpretation

Before: Training accuracy 95%, Validation accuracy 80% (overfitting)

After: Training accuracy 90%, Validation accuracy 88% (better generalization)

Ensembles like Random Forest reduce overfitting by combining many simple models. This averaging lowers variance and improves accuracy on new data.

Bonus Experiment

Try Gradient Boosting ensemble instead of Random Forest and compare validation accuracy.

💡 Hint

Use sklearn's GradientBoostingClassifier with learning rate 0.1 and max_depth 3 to see if boosting improves results.

Practice

(1/5)

1. Why do ensemble models usually perform better than a single model?

easy

A. Because they always use deep learning

B. Because they use only one model with more data

C. Because they ignore data variability

D. Because they combine multiple models to reduce errors

Why ensembles outperform single models in ML Python - Experiment to Prove It

Start learning this pattern below

Practice

Solution

Step 1: Understand ensemble concept

Step 2: Compare with single model

Final Answer:

Quick Check:

Solution

Step 1: Identify ensemble combination methods

Step 2: Eliminate incorrect methods

Final Answer:

Quick Check:

Solution

Step 1: Calculate average error

Step 2: Understand ensemble effect

Final Answer:

Quick Check:

Solution

Step 1: Analyze ensemble failure cause

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand noise impact on models

Step 2: Compare strategies

Final Answer:

Quick Check: