How to Use Optuna for Hyperparameter Tuning in Python
Use
optuna.create_study() to start a tuning study and define an objective function that trains your sklearn model with parameters suggested by trial. Then call study.optimize(objective, n_trials) to find the best hyperparameters automatically.Syntax
Optuna tuning involves creating a study, defining an objective function that takes a trial object, and running optimization. The key parts are:
optuna.create_study(): creates a study to manage trials.objective(trial): function where you suggest hyperparameters and return a metric to minimize or maximize.study.optimize(objective, n_trials): runs the tuning process for a set number of trials.
python
import optuna def objective(trial): param = trial.suggest_float('param', 0.0, 1.0) score = some_model_training_and_evaluation(param) return score study = optuna.create_study(direction='minimize') study.optimize(objective, n_trials=100)
Example
This example shows how to tune the max_depth and n_estimators hyperparameters of a RandomForestClassifier on the iris dataset using Optuna.
python
import optuna from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score from sklearn.model_selection import StratifiedKFold import numpy as np data = load_iris() X, y = data.data, data.target def objective(trial): max_depth = trial.suggest_int('max_depth', 2, 32) n_estimators = trial.suggest_int('n_estimators', 10, 200) clf = RandomForestClassifier(max_depth=max_depth, n_estimators=n_estimators, random_state=42) cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42) scores = cross_val_score(clf, X, y, cv=cv, scoring='accuracy') return 1.0 - np.mean(scores) # minimize error study = optuna.create_study(direction='minimize') study.optimize(objective, n_trials=20) print('Best parameters:', study.best_params) print('Best accuracy:', 1 - study.best_value)
Output
Best parameters: {'max_depth': 11, 'n_estimators': 109}
Best accuracy: 0.9666666666666667
Common Pitfalls
Common mistakes when using Optuna include:
- Not returning the correct metric from the
objectivefunction (should be a scalar to minimize or maximize). - Forgetting to set
direction='minimize'or'maximize'increate_study()to match your metric. - Using non-deterministic model training without fixing random seeds, causing unstable results.
- Not using cross-validation, which can lead to overfitting on a single train/test split.
Example of a wrong objective function and the corrected version:
python
from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score, StratifiedKFold import numpy as np def wrong_objective(trial): max_depth = trial.suggest_int('max_depth', 2, 32) clf = RandomForestClassifier(max_depth=max_depth) clf.fit(X, y) return clf.score(X, y) # returns accuracy to maximize but study defaults to minimize def correct_objective(trial): max_depth = trial.suggest_int('max_depth', 2, 32) clf = RandomForestClassifier(max_depth=max_depth, random_state=42) cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42) scores = cross_val_score(clf, X, y, cv=cv, scoring='accuracy') return 1.0 - np.mean(scores) # minimize error
Quick Reference
Tips for using Optuna effectively:
- Always define an
objective(trial)function that returns a scalar metric. - Use
trial.suggest_int,trial.suggest_float, ortrial.suggest_categoricalto sample hyperparameters. - Set
direction='minimize'or'maximize'increate_study()to match your goal. - Use cross-validation inside the objective for robust evaluation.
- Set random seeds for reproducibility.
Key Takeaways
Define an objective function that suggests hyperparameters and returns a scalar metric to optimize.
Create a study with the correct direction ('minimize' or 'maximize') before running optimization.
Use cross-validation inside the objective to get reliable performance estimates.
Set random seeds to ensure reproducible tuning results.
Use Optuna's suggest methods to explore different types of hyperparameters easily.