Bird
Raised Fist0
ML Pythonml~20 mins

Stacking and blending in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Stacking and Blending Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Difference between stacking and blending

Which statement correctly describes the main difference between stacking and blending in ensemble learning?

AStacking combines models by averaging predictions, blending uses majority voting.
BBlending requires no separate data for meta-model training, stacking does.
CBlending trains base models sequentially, stacking trains them in parallel.
DStacking uses cross-validation to train the meta-model, while blending uses a holdout validation set.
Attempts:
2 left
💡 Hint

Think about how the meta-model is trained in each method.

Predict Output
intermediate
2:00remaining
Output of stacking predictions code

What is the output of the following Python code snippet that performs stacking predictions?

ML Python
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=100, n_features=4, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base models
model1 = LogisticRegression(random_state=42)
model2 = DecisionTreeClassifier(random_state=42)

# Train base models
model1.fit(X_train, y_train)
model2.fit(X_train, y_train)

# Generate base predictions for test set
preds1 = model1.predict_proba(X_test)[:, 1]
preds2 = model2.predict_proba(X_test)[:, 1]

# Stack predictions as features
stacked_features = np.column_stack((preds1, preds2))

# Meta-model
meta_model = LogisticRegression(random_state=42)
meta_model.fit(stacked_features, y_test)

# Final predictions
final_preds = meta_model.predict(stacked_features)

print(sum(final_preds))
A18
B20
C12
D16
Attempts:
2 left
💡 Hint

Count how many final predictions are positive (1) in the test set.

Hyperparameter
advanced
2:00remaining
Choosing meta-model for stacking

Which meta-model choice is generally best when stacking base models with diverse prediction scales and distributions?

AA linear regression model without regularization
BA logistic regression model with L2 regularization
CA decision tree with max_depth=1
DA k-nearest neighbors model with k=1
Attempts:
2 left
💡 Hint

Consider a model that handles different scales and avoids overfitting.

Metrics
advanced
2:00remaining
Evaluating blending ensemble performance

You trained a blending ensemble with three base classifiers and a meta-model on a holdout set. Which metric best reflects the meta-model's ability to improve over base models?

ATraining loss of base models
BMean squared error of base models on training data
CF1-score of the meta-model on the holdout set
DAccuracy of the meta-model on the holdout set
Attempts:
2 left
💡 Hint

Think about a metric that balances precision and recall for classification.

🔧 Debug
expert
3:00remaining
Debugging stacking code with data leakage

Consider this stacking code snippet. What is the main issue causing data leakage?

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=200, n_features=5, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

base_model = LogisticRegression(random_state=0)
base_model.fit(X_train, y_train)

# Using base model predictions on training data to train meta-model
train_preds = base_model.predict_proba(X_train)[:, 1]
meta_model = LogisticRegression(random_state=0)
meta_model.fit(train_preds.reshape(-1, 1), y_train)

# Predict on test data
test_preds = base_model.predict_proba(X_test)[:, 1]
final_preds = meta_model.predict(test_preds.reshape(-1, 1))

print(sum(final_preds))
AThe meta-model is trained on base model predictions from the same training data, causing overfitting.
BThe base model is not trained before generating predictions.
CThe test data is used to train the meta-model.
DThe base model predictions are not reshaped correctly before meta-model training.
Attempts:
2 left
💡 Hint

Think about how the meta-model training data is generated.

Practice

(1/5)
1. What is the main goal of stacking and blending in machine learning?
easy
A. To combine multiple models to improve prediction accuracy
B. To reduce the size of the dataset
C. To speed up training by using fewer models
D. To replace all base models with a single model

Solution

  1. Step 1: Understand the purpose of stacking and blending

    Stacking and blending are ensemble techniques that combine predictions from multiple models.
  2. Step 2: Identify the goal of combining models

    The goal is to improve prediction accuracy by leveraging strengths of different models.
  3. Final Answer:

    To combine multiple models to improve prediction accuracy -> Option A
  4. Quick Check:

    Stacking and blending = combine models for better accuracy [OK]
Hint: Stacking and blending combine models to boost accuracy [OK]
Common Mistakes:
  • Thinking stacking reduces dataset size
  • Believing stacking replaces base models
  • Confusing speed with accuracy improvement
2. Which of the following correctly describes how stacking trains its final model?
easy
A. Using random subsets of features
B. Using cross-validation predictions from base models
C. Using a separate holdout set only
D. Using the entire training data without splitting

Solution

  1. Step 1: Recall stacking training method

    Stacking trains the final model on predictions generated by base models using cross-validation.
  2. Step 2: Compare options to stacking method

    Only Using cross-validation predictions from base models mentions cross-validation predictions, which is key to stacking.
  3. Final Answer:

    Using cross-validation predictions from base models -> Option B
  4. Quick Check:

    Stacking uses cross-validation predictions [OK]
Hint: Stacking uses cross-validation predictions for final model [OK]
Common Mistakes:
  • Confusing stacking with blending's holdout set
  • Thinking stacking uses entire data without splits
  • Assuming random feature subsets are used
3. Given the following code snippet for blending, what will be the shape of X_blend_train if X_train has shape (1000, 10) and holdout_ratio=0.2?
from sklearn.model_selection import train_test_split
X_train_full, X_holdout, y_train_full, y_holdout = train_test_split(X_train, y_train, test_size=holdout_ratio, random_state=42)
# Base model predictions on holdout
base_pred_holdout = base_model.predict(X_holdout)
# Blending training data
X_blend_train = base_pred_holdout.reshape(-1, 1)
medium
A. (200, 1)
B. (800, 1)
C. (1000, 1)
D. (200, 10)

Solution

  1. Step 1: Calculate holdout set size

    With 1000 samples and 0.2 holdout ratio, holdout size = 1000 * 0.2 = 200 samples.
  2. Step 2: Determine shape of base model predictions

    Base model predicts on holdout set, so predictions have shape (200,). Reshaping to (-1, 1) makes it (200, 1).
  3. Final Answer:

    (200, 1) -> Option A
  4. Quick Check:

    Holdout size 200, reshape to (200,1) [OK]
Hint: Holdout size = total * ratio; reshape predictions accordingly [OK]
Common Mistakes:
  • Using full training size instead of holdout size
  • Confusing reshape dimensions
  • Assuming predictions keep original feature count
4. You wrote this stacking code but get an error: ValueError: Found input variables with inconsistent numbers of samples. What is the likely cause?
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict

base1 = LogisticRegression()
base2 = RandomForestClassifier()

pred1 = cross_val_predict(base1, X_train, y_train, cv=5)
pred2 = cross_val_predict(base2, X_train, y_train, cv=5)

X_meta = np.column_stack((pred1, pred2))
meta_model = LogisticRegression()
meta_model.fit(X_meta, y_train)
medium
A. Meta model cannot be logistic regression
B. Base models are not fitted before predictions
C. Using cross_val_predict with cv=5 is invalid
D. Base model predictions have different lengths than y_train

Solution

  1. Step 1: Understand cross_val_predict output

    cross_val_predict returns predictions for each sample in X_train, so pred1 and pred2 should have length equal to X_train.
  2. Step 2: Identify cause of inconsistent sample sizes

    If pred1 or pred2 have different lengths than y_train, stacking fails due to mismatch in input sizes.
  3. Final Answer:

    Base model predictions have different lengths than y_train -> Option D
  4. Quick Check:

    Prediction length mismatch causes ValueError [OK]
Hint: Check prediction and label lengths match before stacking [OK]
Common Mistakes:
  • Assuming models must be pre-fitted before cross_val_predict
  • Thinking cv=5 is invalid for cross_val_predict
  • Believing meta model type causes this error
5. You want to blend three base models using a holdout set. Which approach correctly prepares the training data for the blender model?
hard
A. Train blender on base model predictions from full training data without holdout
B. Train base models on holdout set, predict on full training data, then train blender on full predictions
C. Train base models on full training data, predict on holdout, then train blender on holdout predictions
D. Train blender on random subsets of base model predictions without holdout or cross-validation

Solution

  1. Step 1: Understand blending process

    Blending trains base models on full training data, then uses their predictions on a separate holdout set to train the blender model.
  2. Step 2: Evaluate options against blending steps

    Only Train base models on full training data, predict on holdout, then train blender on holdout predictions correctly describes training base models on full data, predicting on holdout, and training blender on those predictions.
  3. Final Answer:

    Train base models on full training data, predict on holdout, then train blender on holdout predictions -> Option C
  4. Quick Check:

    Blending uses holdout predictions for blender training [OK]
Hint: Blending trains blender on holdout predictions from full-trained base models [OK]
Common Mistakes:
  • Training base models on holdout instead of full data
  • Training blender without holdout predictions
  • Ignoring holdout set in blending