Model Pipeline - Stacking and blending
Stacking and blending are ways to combine multiple simple models to make a stronger model. They use the predictions of base models as new inputs for a final model that learns to improve overall accuracy.
Jump into concepts and practice - no test required
Stacking and blending are ways to combine multiple simple models to make a stronger model. They use the predictions of base models as new inputs for a final model that learns to improve overall accuracy.
Loss
0.5 |****
0.4 |****
0.3 |****
0.2 |****
1 2 3 4 5 Epochs| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 0.45 | 0.72 | Base models start learning, meta-model not trained yet |
| 2 | 0.38 | 0.78 | Base models improve, meta-model training begins |
| 3 | 0.32 | 0.83 | Meta-model learns to combine predictions better |
| 4 | 0.28 | 0.86 | Loss decreases steadily, accuracy increases |
| 5 | 0.25 | 0.88 | Training converges with good accuracy |
X_blend_train if X_train has shape (1000, 10) and holdout_ratio=0.2?
from sklearn.model_selection import train_test_split X_train_full, X_holdout, y_train_full, y_holdout = train_test_split(X_train, y_train, test_size=holdout_ratio, random_state=42) # Base model predictions on holdout base_pred_holdout = base_model.predict(X_holdout) # Blending training data X_blend_train = base_pred_holdout.reshape(-1, 1)
ValueError: Found input variables with inconsistent numbers of samples. What is the likely cause?
from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_predict base1 = LogisticRegression() base2 = RandomForestClassifier() pred1 = cross_val_predict(base1, X_train, y_train, cv=5) pred2 = cross_val_predict(base2, X_train, y_train, cv=5) X_meta = np.column_stack((pred1, pred2)) meta_model = LogisticRegression() meta_model.fit(X_meta, y_train)