0
0
MlopsHow-ToBeginner · 4 min read

How to Tune Decision Tree Hyperparameters in Python with sklearn

To tune a decision tree's hyperparameters in Python, use GridSearchCV from sklearn.model_selection with a parameter grid including options like max_depth, min_samples_split, and criterion. This helps find the best settings by testing combinations and evaluating model accuracy.
📐

Syntax

Use GridSearchCV to search over hyperparameter combinations for DecisionTreeClassifier. Define a parameter grid with keys as hyperparameter names and values as lists of options to try.

  • DecisionTreeClassifier(): The decision tree model.
  • param_grid: Dictionary of hyperparameters to tune.
  • GridSearchCV(estimator, param_grid, cv): Runs cross-validation to find best parameters.
  • fit(X, y): Trains the model on data.
  • best_params_: Best hyperparameters found.
python
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

param_grid = {
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

clf = DecisionTreeClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(grid_search.best_params_)
💻

Example

This example shows how to tune a decision tree classifier on the iris dataset using GridSearchCV. It prints the best hyperparameters and the accuracy on test data.

python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define parameter grid
param_grid = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 3, 4],
    'criterion': ['gini', 'entropy']
}

# Create model and grid search
clf = DecisionTreeClassifier(random_state=42)
grid_search = GridSearchCV(clf, param_grid, cv=4)
grid_search.fit(X_train, y_train)

# Best parameters
print('Best hyperparameters:', grid_search.best_params_)

# Predict and evaluate
y_pred = grid_search.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy on test set: {accuracy:.3f}')
Output
Best hyperparameters: {'criterion': 'gini', 'max_depth': 3, 'min_samples_split': 2} Accuracy on test set: 0.978
⚠️

Common Pitfalls

Overfitting: Setting max_depth too high can make the tree memorize training data and perform poorly on new data.

Underfitting: Setting max_depth too low or min_samples_split too high can make the tree too simple to capture patterns.

Ignoring cross-validation: Not using cross-validation can lead to choosing hyperparameters that work only on training data.

Not scaling data: Decision trees do not require feature scaling, so scaling is unnecessary.

python
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Wrong: No cross-validation, just training score
clf = DecisionTreeClassifier(max_depth=10)
clf.fit(X_train, y_train)
print('Training accuracy:', clf.score(X_train, y_train))

# Right: Use GridSearchCV with cross-validation
param_grid = {'max_depth': [2, 5, 10]}
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print('Best max_depth:', grid_search.best_params_['max_depth'])
📊

Quick Reference

Here are key hyperparameters to tune in DecisionTreeClassifier:

HyperparameterDescriptionTypical Values
max_depthMaximum depth of the tree[None, 3, 5, 10]
min_samples_splitMinimum samples to split a node[2, 5, 10]
min_samples_leafMinimum samples at a leaf node[1, 2, 4]
criterionFunction to measure quality of split['gini', 'entropy']
max_featuresNumber of features to consider at split[None, 'sqrt', 'log2']

Key Takeaways

Use GridSearchCV with cross-validation to find the best decision tree hyperparameters.
Tune max_depth and min_samples_split to balance underfitting and overfitting.
Evaluate model performance on a separate test set after tuning.
Decision trees do not require feature scaling.
Try different criteria like 'gini' and 'entropy' to improve splits.