0
0
MlopsHow-ToBeginner · 4 min read

How to Use GridSearchCV in sklearn Python for Hyperparameter Tuning

Use GridSearchCV from sklearn.model_selection to search for the best hyperparameters by specifying a model, parameter grid, and scoring method. Fit it on your training data, then access the best parameters with best_params_ and best model with best_estimator_.
📐

Syntax

The basic syntax of GridSearchCV involves creating an instance with a model, a dictionary of parameters to try, and optional settings like cross-validation folds and scoring metric.

  • estimator: The machine learning model you want to tune.
  • param_grid: Dictionary where keys are parameter names and values are lists of settings to try.
  • cv: Number of cross-validation folds (default is 5).
  • scoring: Metric to evaluate model performance (e.g., 'accuracy').
  • n_jobs: Number of CPU cores to use (-1 uses all cores).

After creating the GridSearchCV object, call fit(X, y) to run the search.

python
from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)
💻

Example

This example shows how to use GridSearchCV to tune hyperparameters of a Support Vector Machine (SVM) classifier on the iris dataset. It searches for the best C and kernel values and prints the best parameters and accuracy.

python
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define model
model = SVC()

# Define parameter grid
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}

# Create GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy', n_jobs=-1)

# Fit grid search
grid_search.fit(X_train, y_train)

# Best parameters
print('Best parameters:', grid_search.best_params_)

# Predict with best model
y_pred = grid_search.best_estimator_.predict(X_test)

# Accuracy
print('Test accuracy:', accuracy_score(y_test, y_pred))
Output
Best parameters: {'C': 1, 'kernel': 'linear'} Test accuracy: 1.0
⚠️

Common Pitfalls

  • Not specifying a parameter grid or giving incorrect parameter names causes errors.
  • Using too large a grid can make search very slow.
  • Forgetting to split data before grid search can cause data leakage.
  • Not setting cv can lead to poor model evaluation.
  • Using n_jobs=1 can slow down search; use -1 to use all cores.

Always check parameter names match the model's parameters exactly.

python
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

# Wrong parameter name example
param_grid_wrong = {'C_value': [1, 10]}  # Incorrect key

model = SVC()

try:
    grid_search = GridSearchCV(model, param_grid_wrong)
    grid_search.fit(X_train, y_train)
except ValueError as e:
    print('Error:', e)

# Correct parameter grid
param_grid_correct = {'C': [1, 10]}
grid_search = GridSearchCV(model, param_grid_correct)
grid_search.fit(X_train, y_train)
print('Grid search ran successfully with correct parameters.')
Output
Error: Invalid parameter C_value for estimator SVC(). Check the list of available parameters with `estimator.get_params().keys()`. Grid search ran successfully with correct parameters.
📊

Quick Reference

Key points to remember when using GridSearchCV:

  • Use param_grid to specify parameters to try.
  • Set cv for cross-validation folds (default 5).
  • Use scoring to choose metric like 'accuracy' or 'roc_auc'.
  • Access best parameters with best_params_.
  • Use best_estimator_ to get the tuned model.
  • Set n_jobs=-1 to speed up with all CPU cores.

Key Takeaways

GridSearchCV automates hyperparameter tuning by trying all parameter combinations with cross-validation.
Always provide correct parameter names matching the model's parameters in param_grid.
Use cross-validation (cv) to get reliable model performance estimates during tuning.
Access best parameters with best_params_ and best model with best_estimator_ after fitting.
Use n_jobs=-1 to speed up grid search by using all CPU cores.