We use hyperparameter tuning to find the best settings for a machine learning model so it works well on new data.
0
0
Hyperparameter tuning (GridSearchCV) in ML Python
Introduction
When you want to improve your model's accuracy by trying different settings.
When you have a model with options like tree depth or number of neighbors to choose from.
When you want to avoid guessing which settings work best.
When you want to compare many combinations of settings automatically.
Syntax
ML Python
from sklearn.model_selection import GridSearchCV model = SomeModel() param_grid = {'param1': [values], 'param2': [values]} grid = GridSearchCV(model, param_grid, cv=number_of_folds) grid.fit(X_train, y_train) best_model = grid.best_estimator_ best_params = grid.best_params_
GridSearchCV tries all combinations of parameters you give it.
cv means how many parts to split your data for testing during tuning.
Examples
This tries decision trees with different depths and split sizes using 3-fold cross-validation.
ML Python
from sklearn.model_selection import GridSearchCV from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier() param_grid = {'max_depth': [3, 5, 7], 'min_samples_split': [2, 4]} grid = GridSearchCV(model, param_grid, cv=3) grid.fit(X_train, y_train)
This tries different SVM settings with 5-fold cross-validation.
ML Python
from sklearn.model_selection import GridSearchCV from sklearn.svm import SVC model = SVC() param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']} grid = GridSearchCV(model, param_grid, cv=5) grid.fit(X_train, y_train)
Sample Program
This program loads the iris flower data, splits it, and uses GridSearchCV to find the best decision tree settings. Then it tests the best model and prints accuracy.
ML Python
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score # Load data X, y = load_iris(return_X_y=True) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Define model and parameters dt = DecisionTreeClassifier(random_state=42) param_grid = {'max_depth': [2, 3, 4], 'min_samples_split': [2, 3]} # Setup GridSearchCV grid_search = GridSearchCV(dt, param_grid, cv=3) # Train with grid search grid_search.fit(X_train, y_train) # Best model and parameters best_model = grid_search.best_estimator_ best_params = grid_search.best_params_ # Predict and evaluate predictions = best_model.predict(X_test) accuracy = accuracy_score(y_test, predictions) print(f"Best Parameters: {best_params}") print(f"Test Accuracy: {accuracy:.3f}")
OutputSuccess
Important Notes
GridSearchCV can take time if you try many parameters or large data.
Use random_state to get the same results every time.
You can check grid_search.cv_results_ to see scores for all tries.
Summary
GridSearchCV helps find the best model settings automatically.
It tries all combinations you give and tests them with cross-validation.
Use it to improve your model's accuracy without guessing.