How to Use XGBoost Regressor in Python with sklearn
To use
XGBRegressor in Python, first install the xgboost package, then import XGBRegressor from xgboost. Create a model instance, fit it on training data with fit(), and predict with predict().Syntax
The basic syntax to use XGBRegressor involves importing the class, creating an instance with optional parameters, fitting the model on training data, and predicting on new data.
- import: Import
XGBRegressorfromxgboost. - model = XGBRegressor(params): Create the regressor with parameters like
n_estimators(number of trees),max_depth(tree depth), andlearning_rate. - model.fit(X_train, y_train): Train the model on features
X_trainand targety_train. - model.predict(X_test): Predict target values for new data
X_test.
python
from xgboost import XGBRegressor model = XGBRegressor(n_estimators=100, max_depth=3, learning_rate=0.1) model.fit(X_train, y_train) predictions = model.predict(X_test)
Example
This example shows how to train an XGBRegressor on the Boston housing dataset and evaluate its performance using mean squared error.
python
from xgboost import XGBRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error # Load dataset boston = load_boston() X, y = boston.data, boston.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create and train model model = XGBRegressor(n_estimators=100, max_depth=4, learning_rate=0.1, random_state=42) model.fit(X_train, y_train) # Predict predictions = model.predict(X_test) # Evaluate mse = mean_squared_error(y_test, predictions) print(f"Mean Squared Error: {mse:.2f}")
Output
Mean Squared Error: 10.52
Common Pitfalls
- Not installing xgboost: You must install the package with
pip install xgboostbefore importing. - Wrong data shape: Features must be 2D arrays; targets 1D arrays.
- Ignoring random state: For reproducible results, set
random_state. - Using default parameters blindly: Tune
n_estimators,max_depth, andlearning_ratefor better results.
python
from xgboost import XGBRegressor # Wrong: missing fit before predict model = XGBRegressor() predictions = model.predict(X_test) # This will raise an error # Right way: model.fit(X_train, y_train) predictions = model.predict(X_test)
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| n_estimators | Number of trees to build | 100 |
| max_depth | Maximum depth of each tree | 6 |
| learning_rate | Step size shrinkage | 0.3 |
| random_state | Seed for reproducibility | None |
| objective | Learning task and objective | 'reg:squarederror' |
Key Takeaways
Install xgboost and import XGBRegressor to start using it.
Fit the model on training data before predicting.
Tune parameters like n_estimators and max_depth for better accuracy.
Ensure input data shapes are correct: 2D features and 1D targets.
Set random_state for reproducible results.