MlopsHow-ToBeginner · 4 min read

How to Tune Decision Tree Hyperparameters in Python with sklearn

To tune a decision tree's hyperparameters in Python, use GridSearchCV from sklearn.model_selection with a parameter grid including options like max_depth, min_samples_split, and criterion. This helps find the best settings by testing combinations and evaluating model accuracy.

📐

Syntax

Use GridSearchCV to search over hyperparameter combinations for DecisionTreeClassifier. Define a parameter grid with keys as hyperparameter names and values as lists of options to try.

DecisionTreeClassifier(): The decision tree model.
param_grid: Dictionary of hyperparameters to tune.
GridSearchCV(estimator, param_grid, cv): Runs cross-validation to find best parameters.
fit(X, y): Trains the model on data.
best_params_: Best hyperparameters found.

python

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

param_grid = {
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10],
    'criterion': ['gini', 'entropy']
}

clf = DecisionTreeClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(grid_search.best_params_)

💻

Example

This example shows how to tune a decision tree classifier on the iris dataset using GridSearchCV. It prints the best hyperparameters and the accuracy on test data.

python

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define parameter grid
param_grid = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 3, 4],
    'criterion': ['gini', 'entropy']
}

# Create model and grid search
clf = DecisionTreeClassifier(random_state=42)
grid_search = GridSearchCV(clf, param_grid, cv=4)
grid_search.fit(X_train, y_train)

# Best parameters
print('Best hyperparameters:', grid_search.best_params_)

# Predict and evaluate
y_pred = grid_search.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy on test set: {accuracy:.3f}')

Output

Best hyperparameters: {'criterion': 'gini', 'max_depth': 3, 'min_samples_split': 2} Accuracy on test set: 0.978

⚠️

Common Pitfalls

Overfitting: Setting max_depth too high can make the tree memorize training data and perform poorly on new data.

Underfitting: Setting max_depth too low or min_samples_split too high can make the tree too simple to capture patterns.

Ignoring cross-validation: Not using cross-validation can lead to choosing hyperparameters that work only on training data.

Not scaling data: Decision trees do not require feature scaling, so scaling is unnecessary.

python

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

# Wrong: No cross-validation, just training score
clf = DecisionTreeClassifier(max_depth=10)
clf.fit(X_train, y_train)
print('Training accuracy:', clf.score(X_train, y_train))

# Right: Use GridSearchCV with cross-validation
param_grid = {'max_depth': [2, 5, 10]}
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print('Best max_depth:', grid_search.best_params_['max_depth'])

📊

Quick Reference

Here are key hyperparameters to tune in DecisionTreeClassifier:

Hyperparameter	Description	Typical Values
max_depth	Maximum depth of the tree	[None, 3, 5, 10]
min_samples_split	Minimum samples to split a node	[2, 5, 10]
min_samples_leaf	Minimum samples at a leaf node	[1, 2, 4]
criterion	Function to measure quality of split	['gini', 'entropy']
max_features	Number of features to consider at split	[None, 'sqrt', 'log2']

✅

Key Takeaways

Use GridSearchCV with cross-validation to find the best decision tree hyperparameters.

Tune max_depth and min_samples_split to balance underfitting and overfitting.

Evaluate model performance on a separate test set after tuning.

Decision trees do not require feature scaling.

Try different criteria like 'gini' and 'entropy' to improve splits.