How to use GridSearchCV sklearn in python

MlopsHow-ToBeginner · 4 min read

How to Use GridSearchCV in sklearn Python for Hyperparameter Tuning

Use GridSearchCV from sklearn.model_selection to search for the best hyperparameters by specifying a model, parameter grid, and scoring method. Fit it on your training data, then access the best parameters with best_params_ and best model with best_estimator_.

📐

Syntax

The basic syntax of GridSearchCV involves creating an instance with a model, a dictionary of parameters to try, and optional settings like cross-validation folds and scoring metric.

estimator: The machine learning model you want to tune.
param_grid: Dictionary where keys are parameter names and values are lists of settings to try.
cv: Number of cross-validation folds (default is 5).
scoring: Metric to evaluate model performance (e.g., 'accuracy').
n_jobs: Number of CPU cores to use (-1 uses all cores).

After creating the GridSearchCV object, call fit(X, y) to run the search.

python

from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

💻

Example

This example shows how to use GridSearchCV to tune hyperparameters of a Support Vector Machine (SVM) classifier on the iris dataset. It searches for the best C and kernel values and prints the best parameters and accuracy.

python

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define model
model = SVC()

# Define parameter grid
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}

# Create GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy', n_jobs=-1)

# Fit grid search
grid_search.fit(X_train, y_train)

# Best parameters
print('Best parameters:', grid_search.best_params_)

# Predict with best model
y_pred = grid_search.best_estimator_.predict(X_test)

# Accuracy
print('Test accuracy:', accuracy_score(y_test, y_pred))

Output

Best parameters: {'C': 1, 'kernel': 'linear'} Test accuracy: 1.0

⚠️

Common Pitfalls

Not specifying a parameter grid or giving incorrect parameter names causes errors.
Using too large a grid can make search very slow.
Forgetting to split data before grid search can cause data leakage.
Not setting cv can lead to poor model evaluation.
Using n_jobs=1 can slow down search; use -1 to use all cores.

Always check parameter names match the model's parameters exactly.

python

from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

# Wrong parameter name example
param_grid_wrong = {'C_value': [1, 10]}  # Incorrect key

model = SVC()

try:
    grid_search = GridSearchCV(model, param_grid_wrong)
    grid_search.fit(X_train, y_train)
except ValueError as e:
    print('Error:', e)

# Correct parameter grid
param_grid_correct = {'C': [1, 10]}
grid_search = GridSearchCV(model, param_grid_correct)
grid_search.fit(X_train, y_train)
print('Grid search ran successfully with correct parameters.')

Output

Error: Invalid parameter C_value for estimator SVC(). Check the list of available parameters with `estimator.get_params().keys()`. Grid search ran successfully with correct parameters.

📊

Quick Reference

Key points to remember when using GridSearchCV:

Use param_grid to specify parameters to try.
Set cv for cross-validation folds (default 5).
Use scoring to choose metric like 'accuracy' or 'roc_auc'.
Access best parameters with best_params_.
Use best_estimator_ to get the tuned model.
Set n_jobs=-1 to speed up with all CPU cores.

✅

Key Takeaways

GridSearchCV automates hyperparameter tuning by trying all parameter combinations with cross-validation.

Always provide correct parameter names matching the model's parameters in param_grid.

Use cross-validation (cv) to get reliable model performance estimates during tuning.

Access best parameters with best_params_ and best model with best_estimator_ after fitting.

Use n_jobs=-1 to speed up grid search by using all CPU cores.