Imagine you want to predict house prices. Gradient Boosting builds many small models one after another. What is the main way it improves predictions?
Think about how mistakes from earlier models guide the next ones.
Gradient Boosting builds models sequentially. Each new model learns to predict the errors (residuals) of the combined previous models, improving overall accuracy.
What will be the printed training loss after 3 iterations of this Gradient Boosting Regressor?
from sklearn.datasets import make_regression from sklearn.ensemble import GradientBoostingRegressor X, y = make_regression(n_samples=100, n_features=1, noise=5, random_state=42) model = GradientBoostingRegressor(n_estimators=3, learning_rate=1.0, max_depth=1, random_state=42) model.fit(X, y) print(round(model.train_score_[-1], 2))
Look at how train_score_ stores deviance (loss) after each iteration.
The train_score_ attribute stores the loss after each boosting iteration. After 3 iterations, the loss is approximately 0.12 for this setup.
You want to use Gradient Boosting to predict a continuous target with complex nonlinear relationships. Which base learner is best to use?
Consider which model type can capture nonlinear patterns well and is commonly used in Gradient Boosting.
Gradient Boosting typically uses shallow decision trees as base learners because they can capture nonlinearities and interactions efficiently while being fast to train.
What happens if you set the learning rate too high in a Gradient Boosting regression model?
Think about how a large step size affects the model updates.
A high learning rate causes the model to make large updates, which can lead to overfitting and unstable training, reducing generalization.
Consider this code snippet:
from sklearn.ensemble import GradientBoostingRegressor model = GradientBoostingRegressor(n_estimators=100, loss='log_loss') model.fit(X_train, y_train)
Why does it raise a ValueError?
Check the allowed loss functions for regression in Gradient Boosting.
'log_loss' is a loss function used for classification, not regression. GradientBoostingRegressor only accepts regression losses like 'squared_error' or 'absolute_error'.