How to Use RandomizedSearchCV in sklearn with Python
Use
RandomizedSearchCV from sklearn.model_selection to search hyperparameters by sampling random combinations. Initialize it with an estimator, parameter distribution, number of iterations, and scoring metric, then call fit() on your data to find the best parameters.Syntax
The basic syntax of RandomizedSearchCV involves specifying the model, parameter distributions, number of parameter settings to try, and scoring method. You then fit it on your training data to perform the search.
- estimator: The machine learning model to tune.
- param_distributions: Dictionary with parameters names as keys and distributions or lists of parameters to try.
- n_iter: Number of parameter settings sampled.
- scoring: Metric to evaluate performance.
- cv: Number of cross-validation folds.
python
from sklearn.model_selection import RandomizedSearchCV random_search = RandomizedSearchCV( estimator=model, param_distributions=param_dist, n_iter=10, scoring='accuracy', cv=5, random_state=42 ) random_search.fit(X_train, y_train)
Example
This example shows how to use RandomizedSearchCV to tune hyperparameters of a RandomForestClassifier on the iris dataset. It searches over number of trees and max depth, then prints the best parameters and accuracy.
python
from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import RandomizedSearchCV, train_test_split from sklearn.metrics import accuracy_score import numpy as np # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Define model model = RandomForestClassifier(random_state=42) # Define parameter distribution param_dist = { 'n_estimators': [10, 50, 100, 200], 'max_depth': [None, 5, 10, 20], 'max_features': ['sqrt', 'log2'] } # Setup RandomizedSearchCV random_search = RandomizedSearchCV( estimator=model, param_distributions=param_dist, n_iter=5, scoring='accuracy', cv=3, random_state=42 ) # Fit random_search.fit(X_train, y_train) # Predict and evaluate best_model = random_search.best_estimator_ y_pred = best_model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f"Best parameters: {random_search.best_params_}") print(f"Test accuracy: {accuracy:.3f}")
Output
Best parameters: {'n_estimators': 200, 'max_features': 'sqrt', 'max_depth': 10}
Test accuracy: 0.978
Common Pitfalls
Common mistakes when using RandomizedSearchCV include:
- Not setting
random_state, which makes results non-reproducible. - Using too few iterations (
n_iter), which may miss good parameters. - Passing parameter grids instead of distributions, which works but loses randomness benefits.
- Not scaling or preprocessing data if required by the model.
Always check that parameter names match the estimator's parameters exactly.
python
from sklearn.model_selection import RandomizedSearchCV from sklearn.svm import SVC import numpy as np # Wrong: Using grid instead of distributions and no random_state param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']} random_search = RandomizedSearchCV(SVC(), param_distributions=param_grid, n_iter=2) # Right: Use random_state and distributions param_dist = {'C': np.logspace(-3, 2, 100), 'kernel': ['linear', 'rbf']} random_search = RandomizedSearchCV(SVC(), param_distributions=param_dist, n_iter=10, random_state=42)
Quick Reference
Key points to remember when using RandomizedSearchCV:
- Use
param_distributionswith distributions or lists. - Set
n_iterto control search size. - Set
random_statefor reproducibility. - Use
cvfor cross-validation folds. - Call
fit()to start the search. - Access best parameters with
best_params_.
Key Takeaways
RandomizedSearchCV samples random hyperparameter combinations to efficiently find good settings.
Always set random_state for reproducible results.
Use enough iterations (n_iter) to explore the parameter space well.
Check parameter names carefully to match the model's expected parameters.
Access best parameters with best_params_ after fitting.