How to use RandomizedSearchCV sklearn in python

MlopsHow-ToBeginner · 4 min read

How to Use RandomizedSearchCV in sklearn with Python

Use RandomizedSearchCV from sklearn.model_selection to search hyperparameters by sampling random combinations. Initialize it with an estimator, parameter distribution, number of iterations, and scoring metric, then call fit() on your data to find the best parameters.

📐

Syntax

The basic syntax of RandomizedSearchCV involves specifying the model, parameter distributions, number of parameter settings to try, and scoring method. You then fit it on your training data to perform the search.

estimator: The machine learning model to tune.
param_distributions: Dictionary with parameters names as keys and distributions or lists of parameters to try.
n_iter: Number of parameter settings sampled.
scoring: Metric to evaluate performance.
cv: Number of cross-validation folds.

python

from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_dist,
    n_iter=10,
    scoring='accuracy',
    cv=5,
    random_state=42
)

random_search.fit(X_train, y_train)

💻

Example

This example shows how to use RandomizedSearchCV to tune hyperparameters of a RandomForestClassifier on the iris dataset. It searches over number of trees and max depth, then prints the best parameters and accuracy.

python

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define model
model = RandomForestClassifier(random_state=42)

# Define parameter distribution
param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [None, 5, 10, 20],
    'max_features': ['sqrt', 'log2']
}

# Setup RandomizedSearchCV
random_search = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_dist,
    n_iter=5,
    scoring='accuracy',
    cv=3,
    random_state=42
)

# Fit
random_search.fit(X_train, y_train)

# Predict and evaluate
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Best parameters: {random_search.best_params_}")
print(f"Test accuracy: {accuracy:.3f}")

Output

Best parameters: {'n_estimators': 200, 'max_features': 'sqrt', 'max_depth': 10} Test accuracy: 0.978

⚠️

Common Pitfalls

Common mistakes when using RandomizedSearchCV include:

Not setting random_state, which makes results non-reproducible.
Using too few iterations (n_iter), which may miss good parameters.
Passing parameter grids instead of distributions, which works but loses randomness benefits.
Not scaling or preprocessing data if required by the model.

Always check that parameter names match the estimator's parameters exactly.

python

from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
import numpy as np

# Wrong: Using grid instead of distributions and no random_state
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
random_search = RandomizedSearchCV(SVC(), param_distributions=param_grid, n_iter=2)

# Right: Use random_state and distributions
param_dist = {'C': np.logspace(-3, 2, 100), 'kernel': ['linear', 'rbf']}
random_search = RandomizedSearchCV(SVC(), param_distributions=param_dist, n_iter=10, random_state=42)

📊

Quick Reference

Key points to remember when using RandomizedSearchCV:

Use param_distributions with distributions or lists.
Set n_iter to control search size.
Set random_state for reproducibility.
Use cv for cross-validation folds.
Call fit() to start the search.
Access best parameters with best_params_.

✅

Key Takeaways

RandomizedSearchCV samples random hyperparameter combinations to efficiently find good settings.

Always set random_state for reproducible results.

Use enough iterations (n_iter) to explore the parameter space well.

Check parameter names carefully to match the model's expected parameters.

Access best parameters with best_params_ after fitting.