0
0
ML Pythonprogramming~5 mins

RandomizedSearchCV in ML Python

Choose your learning style9 modes available
Introduction
RandomizedSearchCV helps find the best settings for a machine learning model by trying random combinations of options quickly.
When you want to improve your model by tuning its settings but have many options to try.
When you have limited time and cannot try every possible combination of settings.
When you want a quick way to find good model settings without testing all possibilities.
When you want to avoid overfitting by testing different settings on separate data.
When you want to compare different models or settings fairly using cross-validation.
Syntax
ML Python
from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV(
    estimator, param_distributions, n_iter=10, scoring=None, cv=5, random_state=None
)

random_search.fit(X_train, y_train)
estimator is your machine learning model, like a decision tree or logistic regression.
param_distributions is a dictionary where keys are model settings and values are lists or distributions to try.
Examples
This tries 4 random combinations of number of trees and tree depth for a random forest.
ML Python
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV

param_dist = {'n_estimators': [10, 50, 100], 'max_depth': [None, 10, 20]}
model = RandomForestClassifier()
random_search = RandomizedSearchCV(model, param_dist, n_iter=4, cv=3)
random_search.fit(X_train, y_train)
This tries 3 random combinations of SVM regularization and kernel type.
ML Python
from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV

param_dist = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
model = SVC()
random_search = RandomizedSearchCV(model, param_dist, n_iter=3, cv=5)
random_search.fit(X_train, y_train)
Sample Program
This example loads the iris flower data, splits it, and uses RandomizedSearchCV to find good settings for a random forest. It prints the best settings and test accuracy.
ML Python
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import accuracy_score

# Load data
X, y = load_iris(return_X_y=True)

# Split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define model and parameter distribution
model = RandomForestClassifier(random_state=42)
param_dist = {
    'n_estimators': [10, 50, 100],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5, 10]
}

# Setup RandomizedSearchCV
random_search = RandomizedSearchCV(model, param_dist, n_iter=5, cv=3, random_state=42)

# Fit model
random_search.fit(X_train, y_train)

# Best parameters
best_params = random_search.best_params_

# Predict on test data
y_pred = random_search.predict(X_test)

# Calculate accuracy
acc = accuracy_score(y_test, y_pred)

print(f"Best parameters: {best_params}")
print(f"Test accuracy: {acc:.2f}")
OutputSuccess
Important Notes
RandomizedSearchCV tries random combinations, so results can change each run unless you set random_state.
Use n_iter to control how many combinations to try; more tries can find better settings but take longer.
Cross-validation (cv) helps check model performance on different parts of the data to avoid overfitting.
Summary
RandomizedSearchCV helps find good model settings by trying random options.
It is faster than trying every combination and works well when you have many settings.
Always check the best parameters and test your model on new data.