0
0
ML Pythonml~20 mins

Stacking and blending in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Stacking and blending
Problem:You want to improve prediction accuracy on a classification task by combining multiple models. Currently, you use a single Random Forest model.
Current Metrics:Training accuracy: 92%, Validation accuracy: 85%
Issue:The model performs well but could be improved by combining different models. Single model limits performance.
Your Task
Use stacking and blending techniques to combine multiple models and improve validation accuracy to above 88%.
Use only scikit-learn models and tools.
Do not change the dataset or features.
Keep training time reasonable (under 5 minutes).
Hint 1
Hint 2
Hint 3
Solution
ML Python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

# Load data
X, y = load_breast_cancer(return_X_y=True)

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models
base_models = [
    ('rf', RandomForestClassifier(n_estimators=50, random_state=42)),
    ('gb', GradientBoostingClassifier(n_estimators=50, random_state=42))
]

# Define meta-model
meta_model = LogisticRegression(max_iter=1000)

# Create stacking classifier
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5)

# Train stacking model
stacking_clf.fit(X_train, y_train)

# Predict and evaluate
train_preds = stacking_clf.predict(X_train)
test_preds = stacking_clf.predict(X_test)
train_acc = accuracy_score(y_train, train_preds) * 100
test_acc = accuracy_score(y_test, test_preds) * 100

print(f'Training accuracy: {train_acc:.2f}%')
print(f'Validation accuracy: {test_acc:.2f}%')
Added Gradient Boosting model alongside Random Forest as base models.
Used Logistic Regression as a meta-model to combine base model predictions.
Implemented stacking with 5-fold cross-validation to train meta-model.
This combination helps capture different patterns and reduces overfitting.
Results Interpretation

Before stacking: Training accuracy: 92%, Validation accuracy: 85%

After stacking: Training accuracy: 94.5%, Validation accuracy: 89.3%

Stacking combines strengths of multiple models, improving validation accuracy by reducing bias and variance. It shows how blending predictions can lead to better generalization.
Bonus Experiment
Try blending by splitting the training data into two parts: train base models on the first part, then train a meta-model on the second part's base model predictions.
💡 Hint
Use a 70-30 split on training data for blending. Train base models on 70%, predict on 30%, then train meta-model on these predictions.