What is RandomizedSearchCV in ML Python?

ML Pythonprogramming~5 mins

RandomizedSearchCV in ML Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

RandomizedSearchCV helps find the best settings for a machine learning model by trying random combinations of options quickly.

When you want to improve your model by tuning its settings but have many options to try.

When you have limited time and cannot try every possible combination of settings.

When you want a quick way to find good model settings without testing all possibilities.

When you want to avoid overfitting by testing different settings on separate data.

When you want to compare different models or settings fairly using cross-validation.

Syntax

ML Python

from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV(
    estimator, param_distributions, n_iter=10, scoring=None, cv=5, random_state=None
)

random_search.fit(X_train, y_train)

estimator is your machine learning model, like a decision tree or logistic regression.

param_distributions is a dictionary where keys are model settings and values are lists or distributions to try.

Examples

This tries 4 random combinations of number of trees and tree depth for a random forest.

ML Python

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV

param_dist = {'n_estimators': [10, 50, 100], 'max_depth': [None, 10, 20]}
model = RandomForestClassifier()
random_search = RandomizedSearchCV(model, param_dist, n_iter=4, cv=3)
random_search.fit(X_train, y_train)

This tries 3 random combinations of SVM regularization and kernel type.

ML Python

from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV

param_dist = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
model = SVC()
random_search = RandomizedSearchCV(model, param_dist, n_iter=3, cv=5)
random_search.fit(X_train, y_train)

Sample Program

This example loads the iris flower data, splits it, and uses RandomizedSearchCV to find good settings for a random forest. It prints the best settings and test accuracy.

ML Python

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import accuracy_score

# Load data
X, y = load_iris(return_X_y=True)

# Split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define model and parameter distribution
model = RandomForestClassifier(random_state=42)
param_dist = {
    'n_estimators': [10, 50, 100],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5, 10]
}

# Setup RandomizedSearchCV
random_search = RandomizedSearchCV(model, param_dist, n_iter=5, cv=3, random_state=42)

# Fit model
random_search.fit(X_train, y_train)

# Best parameters
best_params = random_search.best_params_

# Predict on test data
y_pred = random_search.predict(X_test)

# Calculate accuracy
acc = accuracy_score(y_test, y_pred)

print(f"Best parameters: {best_params}")
print(f"Test accuracy: {acc:.2f}")

OutputSuccess

Important Notes

RandomizedSearchCV tries random combinations, so results can change each run unless you set random_state.

Use n_iter to control how many combinations to try; more tries can find better settings but take longer.

Cross-validation (cv) helps check model performance on different parts of the data to avoid overfitting.

Summary

RandomizedSearchCV helps find good model settings by trying random options.

It is faster than trying every combination and works well when you have many settings.

Always check the best parameters and test your model on new data.