How to Use GridSearchCV in sklearn Python for Hyperparameter Tuning
Use
GridSearchCV from sklearn.model_selection to search for the best hyperparameters by specifying a model, parameter grid, and scoring method. Fit it on your training data, then access the best parameters with best_params_ and best model with best_estimator_.Syntax
The basic syntax of GridSearchCV involves creating an instance with a model, a dictionary of parameters to try, and optional settings like cross-validation folds and scoring metric.
estimator: The machine learning model you want to tune.param_grid: Dictionary where keys are parameter names and values are lists of settings to try.cv: Number of cross-validation folds (default is 5).scoring: Metric to evaluate model performance (e.g., 'accuracy').n_jobs: Number of CPU cores to use (-1 uses all cores).
After creating the GridSearchCV object, call fit(X, y) to run the search.
python
from sklearn.model_selection import GridSearchCV grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1) grid_search.fit(X_train, y_train)
Example
This example shows how to use GridSearchCV to tune hyperparameters of a Support Vector Machine (SVM) classifier on the iris dataset. It searches for the best C and kernel values and prints the best parameters and accuracy.
python
from sklearn.datasets import load_iris from sklearn.svm import SVC from sklearn.model_selection import GridSearchCV, train_test_split from sklearn.metrics import accuracy_score # Load data iris = load_iris() X, y = iris.data, iris.target # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Define model model = SVC() # Define parameter grid param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']} # Create GridSearchCV grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, scoring='accuracy', n_jobs=-1) # Fit grid search grid_search.fit(X_train, y_train) # Best parameters print('Best parameters:', grid_search.best_params_) # Predict with best model y_pred = grid_search.best_estimator_.predict(X_test) # Accuracy print('Test accuracy:', accuracy_score(y_test, y_pred))
Output
Best parameters: {'C': 1, 'kernel': 'linear'}
Test accuracy: 1.0
Common Pitfalls
- Not specifying a parameter grid or giving incorrect parameter names causes errors.
- Using too large a grid can make search very slow.
- Forgetting to split data before grid search can cause data leakage.
- Not setting
cvcan lead to poor model evaluation. - Using
n_jobs=1can slow down search; use-1to use all cores.
Always check parameter names match the model's parameters exactly.
python
from sklearn.svm import SVC from sklearn.model_selection import GridSearchCV # Wrong parameter name example param_grid_wrong = {'C_value': [1, 10]} # Incorrect key model = SVC() try: grid_search = GridSearchCV(model, param_grid_wrong) grid_search.fit(X_train, y_train) except ValueError as e: print('Error:', e) # Correct parameter grid param_grid_correct = {'C': [1, 10]} grid_search = GridSearchCV(model, param_grid_correct) grid_search.fit(X_train, y_train) print('Grid search ran successfully with correct parameters.')
Output
Error: Invalid parameter C_value for estimator SVC(). Check the list of available parameters with `estimator.get_params().keys()`.
Grid search ran successfully with correct parameters.
Quick Reference
Key points to remember when using GridSearchCV:
- Use
param_gridto specify parameters to try. - Set
cvfor cross-validation folds (default 5). - Use
scoringto choose metric like 'accuracy' or 'roc_auc'. - Access best parameters with
best_params_. - Use
best_estimator_to get the tuned model. - Set
n_jobs=-1to speed up with all CPU cores.
Key Takeaways
GridSearchCV automates hyperparameter tuning by trying all parameter combinations with cross-validation.
Always provide correct parameter names matching the model's parameters in param_grid.
Use cross-validation (cv) to get reliable model performance estimates during tuning.
Access best parameters with best_params_ and best model with best_estimator_ after fitting.
Use n_jobs=-1 to speed up grid search by using all CPU cores.