MlopsHow-ToBeginner · 3 min read

How to Use XGBoost Regressor in Python with sklearn

To use XGBRegressor in Python, first install the xgboost package, then import XGBRegressor from xgboost. Create a model instance, fit it on training data with fit(), and predict with predict().

📐

Syntax

The basic syntax to use XGBRegressor involves importing the class, creating an instance with optional parameters, fitting the model on training data, and predicting on new data.

import: Import XGBRegressor from xgboost.
model = XGBRegressor(params): Create the regressor with parameters like n_estimators (number of trees), max_depth (tree depth), and learning_rate.
model.fit(X_train, y_train): Train the model on features X_train and target y_train.
model.predict(X_test): Predict target values for new data X_test.

python

from xgboost import XGBRegressor

model = XGBRegressor(n_estimators=100, max_depth=3, learning_rate=0.1)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

💻

Example

This example shows how to train an XGBRegressor on the Boston housing dataset and evaluate its performance using mean squared error.

python

from xgboost import XGBRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load dataset
boston = load_boston()
X, y = boston.data, boston.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train model
model = XGBRegressor(n_estimators=100, max_depth=4, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse:.2f}")

Output

Mean Squared Error: 10.52

⚠️

Common Pitfalls

Not installing xgboost: You must install the package with pip install xgboost before importing.
Wrong data shape: Features must be 2D arrays; targets 1D arrays.
Ignoring random state: For reproducible results, set random_state.
Using default parameters blindly: Tune n_estimators, max_depth, and learning_rate for better results.

python

from xgboost import XGBRegressor

# Wrong: missing fit before predict
model = XGBRegressor()
predictions = model.predict(X_test)  # This will raise an error

# Right way:
model.fit(X_train, y_train)
predictions = model.predict(X_test)

📊

Quick Reference

Parameter	Description	Default
n_estimators	Number of trees to build	100
max_depth	Maximum depth of each tree	6
learning_rate	Step size shrinkage	0.3
random_state	Seed for reproducibility	None
objective	Learning task and objective	'reg:squarederror'

✅

Key Takeaways

Install xgboost and import XGBRegressor to start using it.

Fit the model on training data before predicting.

Tune parameters like n_estimators and max_depth for better accuracy.

Ensure input data shapes are correct: 2D features and 1D targets.

Set random_state for reproducible results.