How to use optuna for tuning in python

MlopsHow-ToBeginner · 4 min read

How to Use Optuna for Hyperparameter Tuning in Python

Use optuna.create_study() to start a tuning study and define an objective function that trains your sklearn model with parameters suggested by trial. Then call study.optimize(objective, n_trials) to find the best hyperparameters automatically.

📐

Syntax

Optuna tuning involves creating a study, defining an objective function that takes a trial object, and running optimization. The key parts are:

optuna.create_study(): creates a study to manage trials.
objective(trial): function where you suggest hyperparameters and return a metric to minimize or maximize.
study.optimize(objective, n_trials): runs the tuning process for a set number of trials.

python

import optuna

def objective(trial):
    param = trial.suggest_float('param', 0.0, 1.0)
    score = some_model_training_and_evaluation(param)
    return score

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

💻

Example

This example shows how to tune the max_depth and n_estimators hyperparameters of a RandomForestClassifier on the iris dataset using Optuna.

python

import optuna
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
import numpy as np

data = load_iris()
X, y = data.data, data.target

def objective(trial):
    max_depth = trial.suggest_int('max_depth', 2, 32)
    n_estimators = trial.suggest_int('n_estimators', 10, 200)
    clf = RandomForestClassifier(max_depth=max_depth, n_estimators=n_estimators, random_state=42)
    cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
    scores = cross_val_score(clf, X, y, cv=cv, scoring='accuracy')
    return 1.0 - np.mean(scores)  # minimize error

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=20)

print('Best parameters:', study.best_params)
print('Best accuracy:', 1 - study.best_value)

Output

Best parameters: {'max_depth': 11, 'n_estimators': 109} Best accuracy: 0.9666666666666667

⚠️

Common Pitfalls

Common mistakes when using Optuna include:

Not returning the correct metric from the objective function (should be a scalar to minimize or maximize).
Forgetting to set direction='minimize' or 'maximize' in create_study() to match your metric.
Using non-deterministic model training without fixing random seeds, causing unstable results.
Not using cross-validation, which can lead to overfitting on a single train/test split.

Example of a wrong objective function and the corrected version:

python

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, StratifiedKFold
import numpy as np

def wrong_objective(trial):
    max_depth = trial.suggest_int('max_depth', 2, 32)
    clf = RandomForestClassifier(max_depth=max_depth)
    clf.fit(X, y)
    return clf.score(X, y)  # returns accuracy to maximize but study defaults to minimize

def correct_objective(trial):
    max_depth = trial.suggest_int('max_depth', 2, 32)
    clf = RandomForestClassifier(max_depth=max_depth, random_state=42)
    cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
    scores = cross_val_score(clf, X, y, cv=cv, scoring='accuracy')
    return 1.0 - np.mean(scores)  # minimize error

📊

Quick Reference

Tips for using Optuna effectively:

Always define an objective(trial) function that returns a scalar metric.
Use trial.suggest_int, trial.suggest_float, or trial.suggest_categorical to sample hyperparameters.
Set direction='minimize' or 'maximize' in create_study() to match your goal.
Use cross-validation inside the objective for robust evaluation.
Set random seeds for reproducibility.

✅

Key Takeaways

Define an objective function that suggests hyperparameters and returns a scalar metric to optimize.

Create a study with the correct direction ('minimize' or 'maximize') before running optimization.

Use cross-validation inside the objective to get reliable performance estimates.

Set random seeds to ensure reproducible tuning results.

Use Optuna's suggest methods to explore different types of hyperparameters easily.