How to Use Bayesian Optimization in Python with scikit-learn
BayesSearchCV from the scikit-optimize library to perform Bayesian optimization for hyperparameter tuning in Python. It works like GridSearchCV but uses a smarter search strategy to find the best model parameters faster.Syntax
The main tool for Bayesian optimization in Python with scikit-learn models is BayesSearchCV from the scikit-optimize library. It wraps your model and searches over specified hyperparameter ranges.
estimator: Your machine learning model (e.g.,RandomForestClassifier()).search_spaces: Dictionary defining hyperparameter names and their ranges or options.n_iter: Number of parameter settings to try.cv: Cross-validation splitting strategy.scoring: Metric to evaluate model performance.
from skopt import BayesSearchCV from sklearn.ensemble import RandomForestClassifier opt = BayesSearchCV( estimator=RandomForestClassifier(), search_spaces={ 'n_estimators': (10, 100), 'max_depth': (1, 10), 'min_samples_split': (2, 10) }, n_iter=30, cv=3, scoring='accuracy' )
Example
This example shows how to use BayesSearchCV to tune a RandomForestClassifier on the iris dataset. It searches for the best number of trees, tree depth, and minimum samples to split nodes, then prints the best score and parameters.
from skopt import BayesSearchCV from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split # Load data X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Define model and search space opt = BayesSearchCV( estimator=RandomForestClassifier(random_state=42), search_spaces={ 'n_estimators': (10, 100), 'max_depth': (1, 10), 'min_samples_split': (2, 10) }, n_iter=20, cv=3, scoring='accuracy', random_state=42 ) # Run optimization opt.fit(X_train, y_train) # Print results print(f"Best accuracy: {opt.best_score_:.3f}") print(f"Best parameters: {opt.best_params_}")
Common Pitfalls
1. Not installing scikit-optimize: You must install it with pip install scikit-optimize before using BayesSearchCV.
2. Using incorrect parameter ranges: Make sure ranges are tuples with valid integer or float bounds. For categorical parameters, use lists.
3. Confusing n_iter with number of folds: n_iter controls how many parameter sets are tested, not cross-validation folds.
4. Forgetting to set random_state: For reproducible results, always set random_state in both your model and BayesSearchCV.
from skopt import BayesSearchCV from sklearn.ensemble import RandomForestClassifier # Wrong: using list instead of tuple for range opt_wrong = BayesSearchCV( estimator=RandomForestClassifier(), search_spaces={'n_estimators': [10, 100]}, # Should be (10, 100) n_iter=10 ) # Right: opt_right = BayesSearchCV( estimator=RandomForestClassifier(), search_spaces={'n_estimators': (10, 100)}, n_iter=10 )
Quick Reference
- Library:
scikit-optimize(install withpip install scikit-optimize). - Class:
BayesSearchCVfor Bayesian hyperparameter tuning. - Key arguments:
estimator,search_spaces,n_iter,cv,scoring,random_state. - Use case: Efficiently find best hyperparameters with fewer trials than grid search.