Bird
Raised Fist0
ML Pythonml~20 mins

LightGBM in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
LightGBM Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding LightGBM's Leaf-wise Tree Growth
LightGBM grows trees using a leaf-wise strategy instead of a level-wise one. What is the main advantage of this leaf-wise growth compared to level-wise growth?
AIt ensures all leaves grow equally, maintaining balanced trees.
BIt reduces overfitting by limiting tree depth strictly.
CIt uses random feature selection to reduce training time.
DIt grows deeper trees with fewer splits, improving accuracy and efficiency.
Attempts:
2 left
💡 Hint
Think about how growing the leaf with the largest loss reduction affects tree shape and performance.
Hyperparameter
intermediate
2:00remaining
Choosing the Right LightGBM Hyperparameter for Overfitting Control
Which LightGBM hyperparameter primarily controls the maximum depth of the trees to prevent overfitting?
Amax_depth
Bnum_leaves
Cmin_data_in_leaf
Dlearning_rate
Attempts:
2 left
💡 Hint
This parameter limits how deep the tree can grow.
Predict Output
advanced
2:00remaining
Output of LightGBM Training with Early Stopping
What will be the output of the following Python code snippet using LightGBM?
ML Python
import lightgbm as lgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
train_data = lgb.Dataset(X_train, label=y_train)
val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)

params = {'objective': 'binary', 'metric': 'binary_logloss', 'verbose': -1}

model = lgb.train(params, train_data, num_boost_round=100, valid_sets=[val_data], early_stopping_rounds=10, verbose_eval=False)
preds = model.predict(X_val, num_iteration=model.best_iteration)
preds_binary = (preds > 0.5).astype(int)
acc = accuracy_score(y_val, preds_binary)
print(f"Accuracy: {acc:.3f}")
AAccuracy: 0.965
BAccuracy: 0.850
CAccuracy: 0.500
DRaises a TypeError due to wrong parameter
Attempts:
2 left
💡 Hint
The breast cancer dataset is easy to classify with LightGBM, expect high accuracy.
Metrics
advanced
2:00remaining
Interpreting LightGBM's Binary Logloss Metric
During LightGBM training for a binary classification task, the binary_logloss metric decreases from 0.6 to 0.2. What does this change indicate about the model?
AThe model's predictions are becoming less accurate.
BThe model's predicted probabilities are closer to true labels.
CThe model is overfitting the training data.
DThe model's accuracy is exactly 80%.
Attempts:
2 left
💡 Hint
Lower logloss means better probability estimates.
🔧 Debug
expert
2:00remaining
Identifying the Cause of a LightGBM Training Error
You run the following LightGBM training code but get the error: "ValueError: Label must be 1-D array." What is the most likely cause?
ML Python
import lightgbm as lgb
import numpy as np

X = np.random.rand(100, 5)
y = np.random.rand(100, 1)  # shape is (100,1)
train_data = lgb.Dataset(X, label=y)
params = {'objective': 'regression'}
model = lgb.train(params, train_data, num_boost_round=10)
AThe num_boost_round value is too low to start training.
BThe feature matrix X has too many columns for LightGBM.
CThe label array y should be 1-dimensional, but it is 2-dimensional.
DThe objective parameter 'regression' is invalid in LightGBM.
Attempts:
2 left
💡 Hint
Check the shape of the label array passed to Dataset.

Practice

(1/5)
1. What is the main purpose of LightGBM in machine learning?
easy
A. To preprocess data by scaling features
B. To build fast and accurate decision tree models
C. To perform image recognition using neural networks
D. To cluster data points without labels

Solution

  1. Step 1: Understand LightGBM's role

    LightGBM is designed to create decision tree models quickly and accurately.
  2. Step 2: Compare with other options

    Options A, B, and D describe other machine learning tasks not related to LightGBM.
  3. Final Answer:

    To build fast and accurate decision tree models -> Option B
  4. Quick Check:

    LightGBM purpose = fast, accurate trees [OK]
Hint: LightGBM is known for fast tree models [OK]
Common Mistakes:
  • Confusing LightGBM with neural networks
  • Thinking LightGBM is for data scaling
  • Assuming LightGBM does clustering
2. Which of the following is the correct way to import LightGBM in Python?
easy
A. import lightgbm as lgb
B. import LightGBM
C. from lightgbm import LightGBM
D. import lgbm

Solution

  1. Step 1: Recall LightGBM import syntax

    The standard way is to import the package as import lightgbm as lgb.
  2. Step 2: Check other options

    Options B, C, and D are incorrect because they use wrong module names or syntax.
  3. Final Answer:

    import lightgbm as lgb -> Option A
  4. Quick Check:

    Standard import = import lightgbm as lgb [OK]
Hint: Use lowercase 'lightgbm' and alias 'lgb' [OK]
Common Mistakes:
  • Using capital letters in import
  • Trying to import non-existent submodules
  • Using wrong alias names
3. What will be the output of this code snippet?
import lightgbm as lgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
train_data = lgb.Dataset(X_train, label=y_train)
params = {'objective': 'multiclass', 'num_class': 3, 'verbose': -1}
model = lgb.train(params, train_data, num_boost_round=10)
preds = model.predict(X_test)
preds_labels = preds.argmax(axis=1)
print(accuracy_score(y_test, preds_labels))
medium
A. An exception because of wrong parameter names
B. A list of predicted class labels
C. A syntax error due to missing import
D. A float value between 0 and 1 representing accuracy

Solution

  1. Step 1: Understand the code flow

    The code trains a LightGBM multiclass model on iris data and predicts test labels, then calculates accuracy.
  2. Step 2: Identify output type

    The print statement outputs accuracy_score, which is a float between 0 and 1.
  3. Final Answer:

    A float value between 0 and 1 representing accuracy -> Option D
  4. Quick Check:

    accuracy_score output = float between 0 and 1 [OK]
Hint: Accuracy score prints float between 0 and 1 [OK]
Common Mistakes:
  • Confusing predicted labels with accuracy output
  • Expecting a list instead of a float
  • Thinking code has syntax errors
4. Identify the error in this LightGBM training code:
import lightgbm as lgb
train_data = lgb.Dataset(X_train, label=y_train)
params = {'objective': 'binary'}
model = lgb.train(params, train_data, num_round=100)
medium
A. The 'objective' value 'binary' is invalid
B. The Dataset object is missing 'feature_name' argument
C. The parameter 'num_round' should be 'num_boost_round'
D. The import statement is incorrect

Solution

  1. Step 1: Check LightGBM training parameters

    The correct parameter for number of boosting rounds is 'num_boost_round', not 'num_round'.
  2. Step 2: Verify other parts

    'binary' is a valid objective, 'feature_name' is optional, and import is correct.
  3. Final Answer:

    The parameter 'num_round' should be 'num_boost_round' -> Option C
  4. Quick Check:

    Correct parameter name = num_boost_round [OK]
Hint: Use 'num_boost_round' for training rounds [OK]
Common Mistakes:
  • Using 'num_round' instead of 'num_boost_round'
  • Thinking 'binary' objective is invalid
  • Adding unnecessary parameters
5. You want to improve LightGBM model accuracy on a classification task. Which combination of actions is best?
hard
A. Increase num_boost_round and tune learning_rate
B. Decrease num_boost_round and remove categorical features
C. Use default parameters without tuning
D. Train with fewer data samples to reduce overfitting

Solution

  1. Step 1: Understand model tuning

    Increasing boosting rounds and tuning learning rate helps the model learn better patterns.
  2. Step 2: Evaluate other options

    Decreasing rounds or removing categorical features usually harms accuracy; training on fewer samples reduces data quality.
  3. Final Answer:

    Increase num_boost_round and tune learning_rate -> Option A
  4. Quick Check:

    Tuning rounds and learning rate improves accuracy [OK]
Hint: Tune rounds and learning rate for better accuracy [OK]
Common Mistakes:
  • Reducing training data to fix overfitting
  • Ignoring categorical features
  • Not tuning parameters at all