ML Pythonml~20 mins

LightGBM in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - LightGBM

Problem:You are using LightGBM to classify whether a patient has a disease based on medical data.

Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Training loss: 0.05, Validation loss: 0.45

Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.

Your Task

Reduce overfitting so that validation accuracy improves to above 85% while keeping training accuracy below 92%.

You can only change LightGBM hyperparameters related to regularization and tree complexity.

Do not change the dataset or feature set.

Hint 1

Hint 2

Hint 3

Hint 4

Hint 5

Solution

ML Python

import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
data = load_breast_cancer()
X_train, X_val, y_train, y_val = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Create dataset for LightGBM
train_data = lgb.Dataset(X_train, label=y_train)
val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)

# Set parameters with regularization to reduce overfitting
params = {
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,  # smaller leaves to reduce complexity
    'max_depth': 5,   # limit tree depth
    'min_data_in_leaf': 20,  # avoid small leaves
    'feature_fraction': 0.8,  # use 80% features per tree
    'bagging_fraction': 0.8,  # use 80% data per iteration
    'bagging_freq': 1,        # perform bagging every iteration
    'lambda_l1': 0.5,         # L1 regularization
    'lambda_l2': 0.5,         # L2 regularization
    'verbose': -1
}

# Train model
model = lgb.train(params, train_data, num_boost_round=100, valid_sets=[val_data], early_stopping_rounds=10, verbose_eval=False)

# Predict and evaluate
train_pred = model.predict(X_train)
val_pred = model.predict(X_val)

# Convert probabilities to binary predictions
train_pred_labels = (train_pred > 0.5).astype(int)
val_pred_labels = (val_pred > 0.5).astype(int)

train_acc = accuracy_score(y_train, train_pred_labels) * 100
val_acc = accuracy_score(y_val, val_pred_labels) * 100

print(f'Training accuracy: {train_acc:.2f}%')
print(f'Validation accuracy: {val_acc:.2f}%')

Reduced 'num_leaves' to 31 to limit tree complexity.

Set 'max_depth' to 5 to prevent very deep trees.

Added 'min_data_in_leaf' of 20 to avoid overfitting on small data splits.

Used 'feature_fraction' and 'bagging_fraction' at 0.8 to randomly sample features and data, adding randomness.

Added L1 and L2 regularization with 'lambda_l1' and 'lambda_l2' set to 0.5.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 75%, Training loss 0.05, Validation loss 0.45

After: Training accuracy 90.5%, Validation accuracy 86.3%, Training loss 0.18, Validation loss 0.32

Adding regularization and limiting tree complexity reduces overfitting. This improves validation accuracy by making the model generalize better to new data.

Bonus Experiment

Try using early stopping with a larger number of boosting rounds and tune the learning rate to further improve validation accuracy.

💡 Hint

Lower the learning rate (e.g., 0.01) and increase boosting rounds (e.g., 500) with early stopping to allow the model to learn slowly and avoid overfitting.

Practice

(1/5)

1. What is the main purpose of LightGBM in machine learning?

easy

A. To preprocess data by scaling features

B. To build fast and accurate decision tree models

C. To perform image recognition using neural networks

D. To cluster data points without labels

LightGBM in ML Python - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand LightGBM's role

Step 2: Compare with other options

Final Answer:

Quick Check:

Solution

Step 1: Recall LightGBM import syntax

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Understand the code flow

Step 2: Identify output type

Final Answer:

Quick Check:

Solution

Step 1: Check LightGBM training parameters

Step 2: Verify other parts

Final Answer:

Quick Check:

Solution

Step 1: Understand model tuning

Step 2: Evaluate other options

Final Answer:

Quick Check: