MlopsHow-ToBeginner · 3 min read

How to Use LightGBM Classifier in Python with sklearn

To use LGBMClassifier in Python, first install the lightgbm package, then import it and create an instance. Fit the model on training data using fit() and predict with predict() like other sklearn classifiers.

📐

Syntax

The basic syntax to use LGBMClassifier involves importing the class, creating an object with optional parameters, fitting it on training data, and predicting labels on new data.

LGBMClassifier(): Creates the model object.
fit(X_train, y_train): Trains the model on features X_train and labels y_train.
predict(X_test): Predicts labels for new features X_test.

python

from lightgbm import LGBMClassifier

model = LGBMClassifier(
    boosting_type='gbdt',  # Gradient Boosting Decision Tree
    num_leaves=31,         # Maximum leaves in one tree
    max_depth=-1,          # No limit on tree depth
    learning_rate=0.1,     # Step size shrinkage
    n_estimators=100       # Number of boosting rounds
)

model.fit(X_train, y_train)
predictions = model.predict(X_test)

💻

Example

This example shows how to train a LightGBM classifier on the Iris dataset and evaluate its accuracy.

python

from lightgbm import LGBMClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create model
model = LGBMClassifier(n_estimators=50, learning_rate=0.1, random_state=42)

# Train model
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Output

Accuracy: 1.00

⚠️

Common Pitfalls

Common mistakes when using LGBMClassifier include:

Not installing the lightgbm package before import.
Passing data with missing values without handling them, which LightGBM can handle but may need explicit settings.
Using incompatible data types like strings instead of numeric arrays.
Confusing predict_proba() output with predict() labels.

Always check data format and install dependencies first.

python

try:
    from lightgbm import LGBMClassifier
except ImportError:
    print("Please install lightgbm package first using 'pip install lightgbm'")

# Wrong: Passing string labels without encoding
# y_train = ['setosa', 'versicolor', 'virginica']  # This will cause issues

# Right: Use numeric labels or encode strings
# from sklearn.preprocessing import LabelEncoder
# le = LabelEncoder()
# y_train_encoded = le.fit_transform(y_train)

Output

Please install lightgbm package first using 'pip install lightgbm'

📊

Quick Reference

Here is a quick summary of key parameters for LGBMClassifier:

Parameter	Description	Default
boosting_type	Type of boosting algorithm (gbdt, dart, goss)	'gbdt'
num_leaves	Maximum number of leaves in one tree	31
max_depth	Maximum tree depth (-1 means no limit)	-1
learning_rate	Step size shrinkage	0.1
n_estimators	Number of boosting rounds	100
random_state	Seed for reproducibility	None

✅

Key Takeaways

Install the lightgbm package before importing LGBMClassifier.

Use fit() to train and predict() to get class predictions.

LightGBM handles numeric data; encode categorical labels properly.

Tune parameters like num_leaves and learning_rate for better results.

Check your data format and handle missing values if needed.