0
0
MlopsHow-ToBeginner · 3 min read

How to Use LightGBM Classifier in Python with sklearn

To use LGBMClassifier in Python, first install the lightgbm package, then import it and create an instance. Fit the model on training data using fit() and predict with predict() like other sklearn classifiers.
📐

Syntax

The basic syntax to use LGBMClassifier involves importing the class, creating an object with optional parameters, fitting it on training data, and predicting labels on new data.

  • LGBMClassifier(): Creates the model object.
  • fit(X_train, y_train): Trains the model on features X_train and labels y_train.
  • predict(X_test): Predicts labels for new features X_test.
python
from lightgbm import LGBMClassifier

model = LGBMClassifier(
    boosting_type='gbdt',  # Gradient Boosting Decision Tree
    num_leaves=31,         # Maximum leaves in one tree
    max_depth=-1,          # No limit on tree depth
    learning_rate=0.1,     # Step size shrinkage
    n_estimators=100       # Number of boosting rounds
)

model.fit(X_train, y_train)
predictions = model.predict(X_test)
💻

Example

This example shows how to train a LightGBM classifier on the Iris dataset and evaluate its accuracy.

python
from lightgbm import LGBMClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create model
model = LGBMClassifier(n_estimators=50, learning_rate=0.1, random_state=42)

# Train model
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Output
Accuracy: 1.00
⚠️

Common Pitfalls

Common mistakes when using LGBMClassifier include:

  • Not installing the lightgbm package before import.
  • Passing data with missing values without handling them, which LightGBM can handle but may need explicit settings.
  • Using incompatible data types like strings instead of numeric arrays.
  • Confusing predict_proba() output with predict() labels.

Always check data format and install dependencies first.

python
try:
    from lightgbm import LGBMClassifier
except ImportError:
    print("Please install lightgbm package first using 'pip install lightgbm'")

# Wrong: Passing string labels without encoding
# y_train = ['setosa', 'versicolor', 'virginica']  # This will cause issues

# Right: Use numeric labels or encode strings
# from sklearn.preprocessing import LabelEncoder
# le = LabelEncoder()
# y_train_encoded = le.fit_transform(y_train)
Output
Please install lightgbm package first using 'pip install lightgbm'
📊

Quick Reference

Here is a quick summary of key parameters for LGBMClassifier:

ParameterDescriptionDefault
boosting_typeType of boosting algorithm (gbdt, dart, goss)'gbdt'
num_leavesMaximum number of leaves in one tree31
max_depthMaximum tree depth (-1 means no limit)-1
learning_rateStep size shrinkage0.1
n_estimatorsNumber of boosting rounds100
random_stateSeed for reproducibilityNone

Key Takeaways

Install the lightgbm package before importing LGBMClassifier.
Use fit() to train and predict() to get class predictions.
LightGBM handles numeric data; encode categorical labels properly.
Tune parameters like num_leaves and learning_rate for better results.
Check your data format and handle missing values if needed.